explosion / spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python
https://spacy.io
MIT License
30.06k stars 4.4k forks source link

[Lemmatisation] : french - some strange error #2659

Closed puli83 closed 6 years ago

puli83 commented 6 years ago

Hi,

it's not a big problem to explain. I lemmatize with spacy this sentence " Logiquement, l' ASSÉ veut présenter les candidats aux prochaines élections."

ROOT of this sentence is "présenter", which is a VERB in his infinitive form. Why I get "poster" as lemma? This is strange. Treetagger correctly lemmatize this sentence.

Do you have any idea about what it is happening?

Your Environment

ines commented 6 years ago

It's possible that these problems are caused by the lookup tables – for some languages, spaCy currently ships with more complex, rule-based lemmatizers. All other languages are currently covered by lookup tables, which aren't as reliable, and sometimes also contain mistakes. You can find a few similar discussions in the feat/lemmatizer tag.

If you've come across a mistake that can be fixed by updating the lookup table, you can always submit a PR to fr/lemmatizer.py. We'd also love to transition all lemmatizers over to a rule-based approach in the future (like English).

ines commented 6 years ago

Merging this with #2668!

lock[bot] commented 6 years ago

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.