-
`char_ngrams/token_ngrams.character`: calls `skipgramcpp()`
`tokens_ngrams.tokens`: calls `qatd_cpp_ngram_mt_list()`
Is this intentional? Behaviours are different: see #391.
-
I am developing an application for a Coursera Capstone with the help of the quanteda package and I constantly face two issues mostly when the method tokens_ngram is running:
1) std::bad_alloc on the …
-
**Internals Recap.** _CLD2 is a Naïve Bayesian classifier, trained on documents of mean size of 200 characters, trained on a corpus of 100M scraped and human expert selected web pages._
When workin…
-
```r
packageVersion("quanteda")
## [1] ‘0.9.8.9029’
tokens_ngrams(tokens("a"), n = 2)
## Warning: stack imbalance in '.Call', 38 then 39
## Warning: stack imbalance in '{', 35 then 36
## Warni…
-
I expect skipgrams with k=2 to produce
```
"a b c" "a b d" "a c d" "a c e" "b c d" "b c e" "b d e" "c d e"
```
But I am getting
```
> tokenizers::tokenize_skip_ngrams('a b c d e', n=3, k=2)
[[1]]
…
-
A few times I've run into counts from topfeatures that seem way off base. In the example below, I get ngrams 2:5 and topfeatures counts 16 occurrences of the ngram, "humor_no_head_games". I then use k…
-
I've found a couple of compatibility issues with the chars.lisp file in src/utils/ and LispWorks
the first was in the +WHITE-CHARS+ param, LispWorks uses #\NO-BREAK-SPACE so I did:
``` lisp
(defpara…
-
Hi,
I just tried to train a model by "./nejiTrain.sh -a example/train/annotations -c example/train/sentences -f example/train/bw_o2_windows.config -if BC2 -m mymodel -o mymodel -t 11". However, …
-
@adamobeng has proposed a simple R-based ngram former that seems to blow away the C++ code in terms of speed. @koheiw has something gone awry with the C++ code? We should test this and figure out what…
-
I am running into circular import problems while preparing a PR to add some functionality to `nltk/text.py` - specifically, I am trying to write a new class `TextFreqDist` in `nltk/probability.py`, wh…