Open asfimport opened 3 years ago
Robert Muir (@rmuir) (migrated from JIRA)
Unfortunately GPL license is not suitable for inclusion in apache projects: https://www.apache.org/legal/resolved.html#category-x
If you need to stem serbian, there is support in the master branch (9.0) coming from snowball: use SnowballFilter(stream, new SerbianStemmer())
Ivan Petrovic (migrated from JIRA)
That's great news! When do you think version 9.0 will be available?
Currently I use serbian_normalization filter (SerbianNormalizationFilter) in Elasticsearch.
I assume you are referring to https://github.com/apache/lucene-solr/blob/master/lucene/analysis/common/src/java/org/tartarus/snowball/ext/SerbianStemmer.java
Probably it will be available using: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-tokenfilter.html
I see that Elasticsearch 7.11.1 includes Lucene 8.7.0, so they are around 3 months behind.
Robert Muir (@rmuir) (migrated from JIRA)
Ivan Petrovic yes that is the autogenerated java code. You can find more documentation (and online demo) on the snowball site if you are curious about it: https://snowballstem.org/algorithms/serbian/stemmer.html
Unfortunately, I have no idea when 9.0 will get released. It was a major effort to synchronize snowball in a more sustainable way, so it was only done for master branch with the gradle build. See #10260 where the Serbian was added (actually a year ago...)
Wikimedia has developed under GPL license stemmer for Serbian and Esperanto language: https://github.com/wikimedia/search-extra-analysis
It would be nice to include it in the core.
Migrated from LUCENE-9821 by Ivan Petrovic