Open bongohrtech opened 10 years ago
This is also the case with Apache Lucene (Java):
I believe the right thing to do for Lucene.NET is leave it as-is, analyzers are expected to behave the same in .NET and Java - and as a by-product that will make indexes readable by both. It is easy enough to create your own analyzer by copying the code and fixing what needs to be fixed. It might make sense to also notify the Apache Lucene project so they can fix it in future releases.
by itamar
Seems to be a reasonable request since its expected for Portuguese to work this way and contributing the fix directly to the Snowball project https://github.com/snowballstem/snowball would literally take years to trickle down to Lucene and then Lucene.Net.
Actually, I have already attempted this. It might work fine. However, this request doesn't have instructions anywhere on how to rework the ZIP file that are used for the tests to verify it works
Of course, without altering the ZIP file also (or instructions on how to alter it), the tests for the Portuguese stemmer fail. Any chance you can add that to this request?
by nightowl888
On PortugueseStemmer.cs[1], there are a few suffixes in the PortugueseStemmer which I believe were copied by mistake from SpanishStemmer[2]:
For more details, see the original report on nltk project:
https://github.com/nltk/nltk/issues/754
[1] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/PortugueseStemmer.cs
[2] https://github.com/apache/lucene.net/blob/master/src/contrib/Snowball/SF/Snowball/Ext/SpanishStemmer.cs
JIRA link - [LUCENENET-547] created by he7d3r