-
Hi,
I love the ngram library! thank you!
May I ask how to make the ngram tokenizer treat different sentences (separated by , say, "," , ";", ".", "-", "(", ")" and such) in the input string. Th…
-
Hi,
I had been using .9.8.5 release for awhile. It corpus, you designated text field by textField. In the recent dev release, its text_field
But after I fix that, I can't figure out what this belo…
-
- 1. What does this package do? (explain in 50 words or less)
This package detects document similarity, and implements the minhash/lsh algorithms.
- 2. Paste the full DESCRIPTION file inside a code…
-
I think the (lack of) documentation for [ngram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html#analysis-ngram-tokenfilter) is misleading.
…
-
I am having an issue with the DictionaryMatch where it matches incorrect words.
I construct the DictionaryMatch object like so:
`DiseaseMatch = DictionaryMatch( d = ["CRPC"], ignore_case= True)`
An…
-
Using the latest commit in table-dev, I want to parse this sentence:
Sentence(Document('17903294', Corpus (GWAS Text Corpus)), 14, u'Elevated circulating levels of hemostatic factors, such as fibrino…
-
The BLEU implementation has sort of generated quite a lot of "wack-a-mole" situations where the fringe cases of using BLEU cause errors or unexpected output values from the following issues: #1328 #12…
-
It would be really useful to be able project new texts into the same feature space as an existing `dfm`. This would be particularly useful if you're using texts as inputs to a predictive model.
-
With `tokenize_ngrams()` (and thus the C++ functions they call) there is a bug if the number of words in the document is between the parameter `n_min` and `n`, and `n` is greater than the number of wo…
-
Sometimes HT+BW users search with inadvertent trailing whitespace, which pings the API as "word+" instead of "word". This fails.
I expect correcting for this can be done on the client or the API side…