-
**Right:**
`[w.lemma_ for w in nlp('funnier')]` -> `['funny']`
**Wrong:**
`[w.lemma_ for w in nlp('faster')]` ->`['faster']`
I think for word _faster_ lemma should be _fast_
-
I noticed when creating a pyinstaller executable from my spacy python scripts, that the size of the resulting package is very large mostly due to the inclusion of something called 'mkl' (whose 'dll's…
-
Can i sequentially apply your nlp preprocessors in the scikit learn Pipeline? If not then i think its a an advantage for the package
-
## Issue
For some words, spaCy doesn't produce the correct lemma. Using an automated method I found about 400 incorrect lemma forms. See [mismatches.txt](https://github.com/explosion/spaCy/files/31…
-
For tf-idf there is no way to have tf for lemmatize form of word (we can count tf for stemmed word or for word with no normalization). Maybe in load_file method in # word normalization section we nee…
-
Questions about the Arabic libraries in CLTK
1. Which tokenizer is used? The stanford segmenter?
2. Does CLTK have a good arabic morphology analysis/lemmatizer? I found one for greek but nothing…
-
```
(gdb) run /home/proycon/work/glem/glem/glem.py -f greek.txt
Starting program: /data2/dev/bin/python3 /home/proycon/work/glem/glem/glem.py -f greek.txt
[Thread debugging using libthread_db enab…
-
## How to reproduce the behaviour
Hello and thank you very much for this beautiful library. :)
I am using spaCy in a PySpark application and am finding that the PhraseMatcher behaves different…
-
All (?) AEs have some "ae" in the name, but the stanford lemmatizer has not. Is there a particular reason for that? I think it was jcore-stanford-lemmatizer-ae in the past it has been changed. AE or n…
-
I am in the process of evaluating Manticore for migrating from Sphinx, but i am having trouble with getting my stuff indexed. The configuration is used with Sphinx since years without any issue, so ei…