character-ngrams Search Results

422 results
for character-ngrams

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

rapidsai/cudf #14684

[BUG] `str.character_ngrams` produces <NA> with strings < ng…

**Describe the bug** The `str.character_ngrams` function produces token `` for strings which are lesser than the provided `n` (shown in image for the case of bigrams). ![result output](https://githu…

Vortexx2 updated 7 months ago
2
massimoaria/bibliometrix #479

termExtraction function: Misleading documentation / bug on "…

Hi, Not sure if this is intended behaviour or not. If it IS intended, I think the documentation is misleading. The termExtraction function has a "remove.terms" argument with the following descr…

kdmaclean updated 1 month ago
1
ChenghaoMou/text-dedup #103

how to dedup short text?

hi there, when I use minhash with lsh or simhash, it's hard to remove short text. anybody could provide some useful method to solve this problem, thanks a ton! take below example, and dive…

varuy322 updated 1 day ago
1
scikit-learn/scikit-learn #7475

Why normalize whitespaces for CountVectorizer(analyzer='char…

I am curious of the rational of replacing consecutive whitespaces with just a single space character for [`CountVectorizer(analyzer='char')`](https://github.com/scikit-learn/scikit-learn/blob/51a765a/…

yxtay updated 2 years ago
15
ParticularMiner/red_string_grouper #4

Question / suggestion to use multiple n-grams to get more fe…

Hi @ParticularMiner, Hope you are doing good. I got to work on the same project again and have a question / suggestion - would it be possible to use multiple n-grams to get more features? Like …

iibarant updated 2 years ago
6
GlobalMaksimum/sadedegel #251

Character ngram option for TfIdfVectorizer

- Using character ngrams in for TfIdf vectorized has yielded improvement in some models. - SadedeGel TfIdf vectorizer should have `analyzer='char'` option similar to `sklearn`s. - It is open to disc…

dafajon updated 3 years ago
2
rapidsai/cudf #13048

[FEA] Story - Improve performance with long strings

Many [strings APIs in libcudf](https://docs.rapids.ai/api/libcudf/stable/group__strings__apis.html) use thread-per-string parallelism in their implementation. This approach works great for processing …

GregoryKimball updated 6 months ago
5
halfdan/tatoeba2 #5

Investigate reimplementation of nihongoparserd / suggestd in…

The current setup is rather complicated. - tatodetect requires an obscure cppcms and doesn't build in the latest version. The ngrams.db generator is written in Python and rarely updated. - nihong…

halfdan updated 7 years ago
1
scikit-learn/scikit-learn #22196

Error in TFIDF vectorizer in "char_wb" analyzer

### Discussed in https://github.com/scikit-learn/scikit-learn/discussions/22195 Originally posted by **Pruthwik** January 12, 2022 For Whitespace sensitive char-n-gram tokenization, TFIDF vect…

Pruthwik updated 2 years ago
2
apple/turicreate #1378

unicode error when using `text_analytics.count_words`

Having observed an error code when using `text_analytics.count_words` to process a SArray. The SArray is like follows: ``` dtype: str Rows: 5 ['ニュースレター会員の皆様、ホテル最大 50% OFF のチャンスをお見逃しなく ! セールは今夜終了 !…

judyboon updated 4 years ago
2

上一页 1...1 2 3 4 5 6 7...43 下一页

422 results for character-ngrams

422 results
for character-ngrams