character-ngrams Search Results

425 results
for character-ngrams

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

wrathematics/ngram #4

Question about tokenization of separate sentences in one str…

Hi, I love the ngram library! thank you! May I ask how to make the ngram tokenizer treat different sentences (separated by , say, "," , ";", ".", "-", "(", ")" and such) in the input string. Th…

JosephPotashnik updated 8 years ago
2
quanteda/quanteda #519

mac binary CRAN build v0.9.9-24 version fails on OS X

Hi, I had been using .9.8.5 release for awhile. It corpus, you designated text field by textField. In the recent dev release, its text_field But after I fix that, I can't figure out what this belo…

tingleyd updated 7 years ago
28
ropensci/software-review #20

textreuse

- 1. What does this package do? (explain in 50 words or less) This package detects document similarity, and implements the minhash/lsh algorithms. - 2. Paste the full DESCRIPTION file inside a code…

lmullen updated 7 years ago
20
elastic/docs #96

Advise that ngram token filter acts on characters

I think the (lack of) documentation for [ngram token filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-ngram-tokenfilter.html#analysis-ngram-tokenfilter) is misleading. …

alexgarel updated 8 years ago
3
snorkel-team/snorkel #368

CoreNLP char_offsets/token mappings do not match raw documen…

I am having an issue with the DictionaryMatch where it matches incorrect words. I construct the DictionaryMatch object like so: `DiseaseMatch = DictionaryMatch( d = ["CRPC"], ignore_case= True)` An…

varun-tandon updated 8 years ago
7
snorkel-team/snorkel #424

Sentence parsed incorrectly in table-dev branch

Using the latest commit in table-dev, I want to parse this sentence: Sentence(Document('17903294', Corpus (GWAS Text Corpus)), 14, u'Elevated circulating levels of hemostatic factors, such as fibrino…

kuleshov updated 8 years ago
6
nltk/nltk #1330

BLEU Issues

The BLEU implementation has sort of generated quite a lot of "wack-a-mole" situations where the fringe cases of using BLEU cause errors or unexpected output values from the following issues: #1328 #12…

alvations updated 8 years ago
9
quanteda/quanteda #46

predict.dfm

It would be really useful to be able project new texts into the same feature space as an existing `dfm`. This would be particularly useful if you're using texts as inputs to a predictive model.

zachmayer updated 8 years ago
11
ropensci/tokenizers #14

Bug with tokenize_ngrams when number of words in document is…

With `tokenize_ngrams()` (and thus the C++ functions they call) there is a bug if the number of words in the document is between the parameter `n_min` and `n`, and `n` is greater than the number of wo…

lmullen updated 8 years ago
1
Bookworm-project/BookwormDB #105

Strip trailing whitespace

Sometimes HT+BW users search with inadvertent trailing whitespace, which pings the API as "word+" instead of "word". This fails. I expect correcting for this can be done on the client or the API side…

organisciak updated 8 years ago
5

上一页 1...35 36 37 38 39 40 41...43 下一页

425 results for character-ngrams

425 results
for character-ngrams