Since plural and grammatical case are all considered perfect matches in our annotation guidelines, we could apply a stemmer to the data to make our models denser.
However, we might need to annotate the new expansions because some pairs might decrease ranking during stemming due to it being considered an abbreviation (e.g. "Vorbefund" -> "Vorbefu", "Vesikuläratmen" -> "Vesikuläratm", "Operation" -> "Operatio").
The CISTEM stemmer seems to improve results over Porter stemmer and has a Python NLTK implementation.
Since plural and grammatical case are all considered perfect matches in our annotation guidelines, we could apply a stemmer to the data to make our models denser.
However, we might need to annotate the new expansions because some pairs might decrease ranking during stemming due to it being considered an abbreviation (e.g. "Vorbefund" -> "Vorbefu", "Vesikuläratmen" -> "Vesikuläratm", "Operation" -> "Operatio").
The CISTEM stemmer seems to improve results over Porter stemmer and has a Python NLTK implementation.
Relates to #87.