Hello, I would like to add support for the Esperanto language, as several projects downstream I use depend on lunr-languages. It is a constructed languaged invented in 1887 by Dr. L.L. Zamenhof, and has over 2-million speakers worldwide.
Fortunately due to the extreme regularity of the language (it only has 16 rules), implementing this should be a lot easier than for other languages.
Advice Needed:
I don't normally work with JavaScript, so I was wondering if anyone involved with the project can help me out with a few things:
Does the stop-words function run before the stemmer? It would greatly reduce the burden if stop-words are filtered out before they get to the stemmer. Otherwise, I will basically wind up having to reimplement the stop-words list again in the stemmer, as most of the stop-words are grammatical prepositions and the like that have irregular endings.
Many other languages have very complicated hundred-line stemmer functions, but in Esperanto, once you filter the special grammatical words, every word ends with either: -is, -as, -os, -us, -u, -e, -en, -a, -an-aj, -ajn, -o, -on, -oj, or -ojn. With that said, my stemmer function can be as simple as just returning a string with the end cut off (this always results in a valid word root). I wasn't sure if I needed to use the SnowballFunction or not.
I'm currently working on Esperanto support on my fork if anyone has any advice, or wants to point out any obvious JS flaws I missted.
Hello, I would like to add support for the Esperanto language, as several projects downstream I use depend on
lunr-languages
. It is a constructed languaged invented in 1887 by Dr. L.L. Zamenhof, and has over 2-million speakers worldwide.Fortunately due to the extreme regularity of the language (it only has 16 rules), implementing this should be a lot easier than for other languages.
Advice Needed:
I don't normally work with JavaScript, so I was wondering if anyone involved with the project can help me out with a few things:
Does the stop-words function run before the stemmer? It would greatly reduce the burden if stop-words are filtered out before they get to the stemmer. Otherwise, I will basically wind up having to reimplement the stop-words list again in the stemmer, as most of the stop-words are grammatical prepositions and the like that have irregular endings.
Many other languages have very complicated hundred-line stemmer functions, but in Esperanto, once you filter the special grammatical words, every word ends with either: -is, -as, -os, -us, -u, -e, -en, -a, -an -aj, -ajn, -o, -on, -oj, or -ojn. With that said, my stemmer function can be as simple as just returning a string with the end cut off (this always results in a valid word root). I wasn't sure if I needed to use the SnowballFunction or not.
I'm currently working on Esperanto support on my fork if anyone has any advice, or wants to point out any obvious JS flaws I missted.