-
Your n_gram_creator works fine how you have it, but here is another way to write it using a foreach loop in case you want to take a look:
```
def ngram_creator(text_list):
ngrams = []
…
-
Noting down some areas where significant speedups may be achieved:
- `vcat` in `ProductNode`s leads to a lot of copying
- data deduplication in leaves may lead to lower memory requirements and als…
-
Could we perhaps add some shortcuts to the individual jmdictdb entry pages for checking the ngrams for all kanji and readings? Maybe not for everyone but at least for loggged-in editors?
For exampl…
-
I searched through the LanguageTool issues and didn't see any about this, but with more than 1,900 open issues and 3,900 closed issues it is certainly possible I missed something. If this is a duplic…
-
As discussed by @szhengac https://github.com/dmlc/gluon-nlp/pull/529#discussion_r255815817, the classification script does not follow the paper. No word-ngram hashing is used.
leezu updated
5 years ago
-
Instead of using words, it's better to use ngrams which is more compressible and is more accurate. You don't need actual words if it's going to be translated anyway. Maybe something similar to keybr
-
I try to read the source code。
```golang
func newRecognizer_8859_2(language string, ngram *[64]uint32) *recognizerSingleByte {
return &recognizerSingleByte{
charset: "ISO-8859-2",
h…
-
Another way to write the loop using a for each instead of a standard for loop would be:
```
def ngram_creator(text_list):
ngrams = []
lastword = None
for word in text_list:
…
-
I am struggled understanding word embeddings of FastText. According to the white paper [Enriching Word Vectors with Subword Information](https://arxiv.org/pdf/1607.04606.pdf), embeddings of a word is …
-
As per title:
- Add rayon support as an optional feature
- Support parallel search
- Support parallel warped search
- Support parallel digestion of corpus
Will handle this in a future pull re…