ftomassetti / semreview

Text classification taking advantage of the Semantic Web
GNU General Public License v3.0
3 stars 0 forks source link

Use another stemmer #12

Open ftomassetti opened 9 years ago

ftomassetti commented 9 years ago

The stemmer we are using is coming from a library which seems to be written mainly in C and it is not published on Maven (see https://github.com/snowballstem/snowball/issues/11). I think we could consider alternatives. For example the stemmer used by Lucene should be available.

ojwb commented 9 years ago

As I said in the ticket you reference, I've no objections to someone putting snowball on Maven. It just isn't something I feel qualified to do.

The Snowball compiler is written in C, but generates pure Java code, so the library you'd use from Java is in fact pure Java. Lucene actually uses Snowball for most of its stemmers.