KonradHoeffner / cubeqa

CubeQA—Question Answering on Statistical Linked Data
https://aksw.org/Projects/CubeQA.html
GNU General Public License v3.0
20 stars 5 forks source link

Egyptian doesn't get stemmed to egypt #31

Closed KonradHoeffner closed 9 years ago

KonradHoeffner commented 9 years ago

See http://linguistics.stackexchange.com/questions/12547/how-to-map-egyptian-to-egypt Stemming seems to be the right method but we need a more aggressive one like the lancaster stemmer but it needs to be integratable into the Lucene index.

KonradHoeffner commented 9 years ago

StanfordNLP Lemmatizer can't do it. There is a StanfordNLP-Lucene integration library but it doesn't support Lucene 4.0. Also it relies on POS tags which are not available for phrases (would need to lemmatize before and map phrases to that).

Lancaster stemmer can do it but only stems words, not phrases. This could be added but then Lucene integration would be added as well. Leaving this for latern now as it is not clear that the effort would be worth it.

KonradHoeffner commented 9 years ago

Closed as unavoidable for now.