google-code-export / dkpro-core-asl

Automatically exported from code.google.com/p/dkpro-core-asl
0 stars 0 forks source link

StanfordLemmatizer should not stem nouns #575

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Currently the StanfordLemmatizer [1] stems everything that is not a verb. This 
especially includes nouns, which results e.g. in marketing->market. That is 
incorrect, as this is derivational, not inflectional morphology - as opposed to 
what is stated in the JavaDoc of StanfordLemmatizer.

The code also states that it aims to copy the behaviour of Stanford's 
MorphaAnnotator [2], which stems only words without a POS tag. This should 
probably done, too, in StanfordLemmatizer.

[1] 
https://code.google.com/p/dkpro-core-gpl/source/browse/de.tudarmstadt.ukp.dkpro.
core-gpl/trunk/de.tudarmstadt.ukp.dkpro.core.stanfordnlp-gpl/src/main/java/de/tu
darmstadt/ukp/dkpro/core/stanfordnlp/StanfordLemmatizer.java

[2] 
https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/pipeline
/MorphaAnnotator.java#L63

(Moved from DKPro Core GPL tracker, issue 31)

Original issue reported on code.google.com by richard.eckart on 8 Jan 2015 at 10:36