Yomguithereal / talisman

Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
https://yomguithereal.github.io/talisman/
MIT License
709 stars 47 forks source link

stemmer/fr: restored & completed carry.js from original publication #137

Open drzraf opened 7 years ago

drzraf commented 7 years ago

raw from the PDF

Yomguithereal commented 7 years ago

Hello @drzraf. Thanks for your PR. Love you script to convert from the PDF to the rules :).

If I remember correctly, I think I avoided the STEP3 rules after some ones because I thought (probably wrongly it seems) that they were an erroneous repetition of some earlier rules. For instance (m > 0) issaient ε is also in STEP1 but I never thought this could be useful to re-run them.

Did it fix your issue with the word tristesse by the way? Can you add some unit tests to reflect the new cases taken into account please (it also seems that some test cases are now broken)?

Concerning my edits, I will add another version side by side which will be called revisited or whatever.

Thanks

drzraf commented 7 years ago

According to an answer of the author of the algorithm, it is to be expected from any desuffixation algorithm. As is Porter, these are part of the expected edge-cases. Other examples he gave: "perte", "mort", "éléments", "order", ...