Closed KonradHoeffner closed 9 years ago
StanfordNLP Lemmatizer can't do it. There is a StanfordNLP-Lucene integration library but it doesn't support Lucene 4.0. Also it relies on POS tags which are not available for phrases (would need to lemmatize before and map phrases to that).
Lancaster stemmer can do it but only stems words, not phrases. This could be added but then Lucene integration would be added as well. Leaving this for latern now as it is not clear that the effort would be worth it.
Closed as unavoidable for now.
See http://linguistics.stackexchange.com/questions/12547/how-to-map-egyptian-to-egypt Stemming seems to be the right method but we need a more aggressive one like the lancaster stemmer but it needs to be integratable into the Lucene index.