-
When i use quanteda functions for generating ngrams from my corpus text, including just an excerpt that can repro issue below, i'm finding i end up with feature names / ngram results that have string …
-
Implement the character n-gram feature for ODIN data. Use the `character-n-gram-size` parameter in the config file and the character_ngrams() function in analyzers.py to compute the n-grams. Use the m…
-
Implement the character n-gram feature for Crubadan data. Use the `character-n-gram-size` parameter in the config file and the character_ngrams() function in analyzers.py to compute the n-grams. Use t…
-
I have moved discussion about the doc2vec / word2vec ipython example from https://github.com/piskvorky/gensim/issues/629 as it was suggested by @Piezoid.
ideas:
- Doc2Vec on biological sequences, par…
-
Discussions about the [Sliding window functions](../blob/master/proposals/stdlib/window-sliding.md) will be held here.
-
Here's a stab at a log-linear model equivalency. The focus is on "capital gains tax".
```r
require(quanteda)
txt
-
I, and I suspect many others, are not that fluent in Hungarian. To expose your work, would you mind to release English keyboard and the corresponding dictionary as well?
-
In current `master`:
```
* checking compiled code ... NOTE
Warning in read_symbols_from_dll(so, rarch) :
this requires 'objdump.exe' to be on the PATH
Warning in read_symbols_from_dll(so, rarch…
-
Respected Repository owner,
I have used your library with my Python 3.6 for similarity comparison between the texts. Kindly have a look at the scenario that has not properly been scored as per me:
…
-
The NGram class can already do non-character N-grams by providing pad_char=(None,) and then either adding tuples or writing a key function that return tuples.
Make this explicit by:
- renaming pad_ch…