Closed imotov closed 11 years ago
I think you have it right. Currently (without the fix) the porter
and porter2
stemmers map to the porter
stemmer. The english
stemmer maps to the porter2
stemmer.
I also did a bit more investigation and I think the lovins stemmer may have a problem too. The proper output for the lovins stemmer is: 'consol', 'hi', 'knight', 'stay', however I remember getting something different. I'll test it out and let you know.
After some discussions we came to the conclusion that it would be safer to just remove reference to the porter2
stemmer from documentation. Changing stemmer in elasticsearch might adversely affect users who are currently using it. Whoever really needs the porter2
stemmer can simply use the english
stemmer instead.
That seems like the right move. If you could add a sentence explaining that english
is implemented by the porter2
stemmer, that would be nice.
As I am sure you already know, the reason it's important to be clear about the implementation is because sometimes you have to do stemming on the ES client side and you need to be sure that the client stemmer matches the ES stemmer.
I added links to stemming algorithms. Closing this issue.
Repro: https://gist.github.com/4165996
Examples of the words that should be stemmed by Porter and Porter2 stemmers differently
See also: https://groups.google.com/d/topic/elasticsearch/HEW3Q9F4ocM/discussion