bastienbot / nlp-js-tools-french

POS Tagger, lemmatizer and stemmer for french language in javascript
MIT License
36 stars 8 forks source link

a couple of problematic results from lemmatizer #4

Open mcthulhu opened 7 years ago

mcthulhu commented 7 years ago
  1. I'm not sure what's happening here, but I was trying to lemmatize the word "écœurante," with config set to { tagTypes: ['adj', 'ver', 'nom'], strictness: false, minimumLength: 3, debug: true }; I had tried with strictness set to true first, then false, but it doesn't seem to matter. The result I get from

var nlpToolsFr = new NlpjsTFr(s, config); var lemmatizedWords = nlpToolsFr.lemmatizer();

is [{"id":0,"word":"urante","lemma":"urante"}], with the écœ at the beginning removed. I can't tell why. Other words beginning with é seem OK.

  1. [{"id":0,"word":"épaules","lemma":"épaules"}] Shouldn't the lemma be "épaule"? This was with the same config object as above.
bastienbot commented 7 years ago

Hello mcthulhu,

1) It kind of makes sense since I didn't anticipate this specific case, I'll patch it soon :) 2) Weird, I'll have a look

I'll keep you informed.

Bastien