-
Should also test if n-grams have proper class.
-
When I try the --use-ngrams option, I receive
Unrecognized option 6: --use-ngrams
-
Example:
Would be nice to have an option that preserves punctuation:
```
console.log(nautral_NGrams.bigrams('Some, words here!!'));
[ [ 'Some', 'words' ], [ 'words', 'here' ] ]
```
I would hav…
-
it might be interesting already in this toolkit to index the ngrams using FSTs or trieBased solutions. This is something that we should discuss since this seems like a rather big step but it would inc…
-
The [ngram](https://github.com/sanskrit-lexicon/hwnorm1/tree/master/ejf/ngram) directory contains
2 and 3 grams computed from the normalized spellings of hwnorm1c. The readme of the ngram directory…
-
This is the branch to substitute svms for ngrams. I've started from the end and i'm workign my way backwards. First, I've added to database_ops.py to cheat-add a table for the SVM data based on the to…
-
Users should stem tokens before forming ngrams, so we do not need `wordstem_ngrams()` anymore.
https://github.com/quanteda/quanteda/blob/1d515f2d379647a873ed9b2c8c98504165c3bdb0/R/wordstem.R#L29-L4…
-
Is it possible to add context to ngram extraction?
For example, currently running
`list(textacy.Doc('I like green eggs and ham.').to_terms_list(ngrams=3,as_strings=True))`
returns a list
…
-
20GB 数据量可以正常训练,100GB 在跑到某一步的时候会卡住。`bytepiece==0.6.3` 。
某个 thread 的堆栈信息,看不出来,直接问 GPT 似乎是多进程的问题:
```
#0 0x00007f168f6207a4 in do_futex_wait.constprop () from /lib64/libpthread.so.0
#1 0x00…
-