character-ngrams Search Results

425 results
for character-ngrams

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pemistahl/lingua #101

Improve performance and reduce memory consumption

As pointed out in #39 and #57 Lingua's great accuracy comes at the cost of high memory usage. This imposes a problem for some projects trying to use Lingua. In this issue I will try to highlight some…

Marcono1234 updated 2 years ago
14
apache/lucene #4021

NGramTokenizer shouldn't trim whitespace [LUCENE-2947]

Before I tokenize my strings, I am padding them with white space: String foobar = " " + foo + " " + bar + " "; When constructing term vectors from ngrams, this strategy has a couple benefits. First…

asfimport updated 2 years ago
8
RUCAIBox/CRSLab #42

Bugs in evaluator

When doing the `ind2txt`, we will get the `string`: ![image](https://user-images.githubusercontent.com/44745604/157155998-8dd21e2d-b7e5-4860-b310-9562b375ae81.png) Then if we calculate the `n-gram`,…

Oran-Ac updated 2 years ago
2
meilisearch/meilisearch #2222

Search Chinese Content, Typo tolerance looks not work

**Describe the bug** When I search Chinese content with one word typo， it looks not hit anything. Is meilisearch not support or need some configuration that I don't known. can anyone help me, th…

ldnvnbl updated 2 years ago
10
nltk/nltk #3065

Perplexity of a count based language model is always infinit…

I trained a `KneserNeyInterpolated` language model with `order=2` as follows: ``` from nltk.lm.preprocessing import padded_everygram_pipeline from nltk.lm import KneserNeyInterpolated, Laplace, S…

david-waterworth updated 1 year ago
5
apache/lucene #2383

CombinedNGramTokenFilter [LUCENE-1306]

Alternative NGram filter that produce tokens with composite prefix and suffix markers. ```java ts = new WhitespaceTokenizer(new StringReader("hello")); ts = new CombinedNGramTokenFilter(ts, 2, 2); a…

asfimport updated 2 years ago
11
juliasilge/tidytext #103

Adding functionality for character ngrams in unnest_tokens

Hello! I was wondering if the ```n``` parameter could be enabled in the ```unnest tokens``` function for ```token = "characters"``` so that we can get more than just single characters for character…

kanishkamisra updated 2 years ago
2
alphagov/govuk-design-system #2178

Replace 'fewer' with 'less', where applicable, in our guidan…

## What Replace 'fewer' with 'less', where applicable, in Design System guidance.* For example, on the character count, error message, header, text input and textarea pages. *Depending on what t…

EoinShaughnessy updated 2 years ago
7
apache/lucene #2301

NGramTokenFilter creates bad TokenStream [LUCENE-1224]

With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string into an index, but I can't query it with "abc". If I query with "ab", I can get a hit result. The reason is that the NGramTo…

asfimport updated 2 years ago
20
juliasilge/tidytext #111

unnest_tokens format error message

I have a couple of times run into the following problem ``` r library(tidyverse) library(tidytext) data.frame(text = janeaustenr::emma) %>% unnest_tokens(word, text, token = "ngram", n = 2)…

EmilHvitfeldt updated 2 years ago
2

上一页 1...14 15 16 17 18 19 20...43 下一页

425 results for character-ngrams

425 results
for character-ngrams