apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.62k stars 1.02k forks source link

HunspellStemFilter returns another values than Hunspell in console / command line with same dictionaries. [LUCENE-7378] #8431

Open asfimport opened 8 years ago

asfimport commented 8 years ago

HunspellStemFilter for hungarian language returns different results than hunspell command for the same dictionary.


Migrated from LUCENE-7378 by Barta Tamás Parent: #5379 Environment:

Apache Solr 5.4.1

Attachments: hu_HU.aff, hu_HU.dic

asfimport commented 8 years ago

Barta Tamás (migrated from JIRA)

Dictionary files

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

For hungarian language, hunspell has a lot of internal special sauce and stuff. We don't implement any of that logic, or even decompounding at all.

asfimport commented 8 years ago

Barta Tamás (migrated from JIRA)

Thanks, do you know any good stemmer solution for hungarian language?

asfimport commented 8 years ago

Robert Muir (@rmuir) (migrated from JIRA)

Lucene has two choices, the hungarian stemmer from snowball (SnowballFilter), and the light stemmer from savoy (HungarianLightStemmer).

But yeah, confusingly, dont use hunspell for hungarian, I'm not sure it will really work at all.