Currently the StanfordLemmatizer [1] stems everything that is not a verb. This
especially includes nouns, which results e.g. in marketing->market. That is
incorrect, as this is derivational, not inflectional morphology - as opposed to
what is stated in the JavaDoc of StanfordLemmatizer.
The code also states that it aims to copy the behaviour of Stanford's
MorphaAnnotator [2], which stems only words without a POS tag. This should
probably done, too, in StanfordLemmatizer.
[1]
https://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.
core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/main/java/de/tu
darmstadt/ukp/dkpro/core/stanfordnlp/StanfordLemmatizer.java
[2]
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline
/MorphaAnnotator.java#L63
(Moved from DKPro Core GPL tracker, issue 31)
Original issue reported on code.google.com by richard.eckart on 8 Jan 2015 at 10:36
Original issue reported on code.google.com by
richard.eckart
on 8 Jan 2015 at 10:36