anaelcarvalho / elasticsearch-analysis-rslp

RSLP stemmer plugin for Elasticsearch
Apache License 2.0
5 stars 3 forks source link

Context for this stemmer #3

Open andreportaro opened 5 years ago

andreportaro commented 5 years ago

Hello! I'm experimenting with the default Brazilian Portuguese stemmer and I'm finding that while it does work with singular/plurals, most adjectives won't work. Searching for alternative stemmers led me here.

I wonder if you could provide any context of why this stemmer was made and if it's a good alternative against the default stemmer? Also, it doesn't look like you migrated to support 6.x so I wonder what are the steps needed to do it and if I can contribute anyhow.

anaelcarvalho commented 5 years ago

Hi,

I haven't kept up with the latest on ES development, but at the time this was made, the default pt-BR stemmer was based on the Porter algorithm and yielded poor results. This was implemented based on a number of works which are provided in the source code itself (see https://github.com/anaelcarvalho/elasticsearch-analysis-rslp/blob/master/src/main/java/org/apache/lucene/analysis/br/RSLPStemmer.java)

It shouldn't be too hard to make it compatible with ES 6.x/7.x; need to check compatibility with Lucene dependencies (TokenFilter / TokenFilterFactory) and ES plug-in model/guideline for implementing a new token filter.