aaaton / golem

A lemmatizer implemented in Go
MIT License
82 stars 20 forks source link

Multiple options gives random result #2

Closed aaaton closed 5 years ago

aaaton commented 6 years ago

If we have a word with multiple options for how it should be lemmatized, the behaviour is undefined

axamon commented 5 years ago

we can make it outuput the shortest or the first in alphabetical order

aaaton commented 5 years ago

Both suggestions sound reasonable, just to get rid of the unpredictability.

Another solution would be to respond with the most likely lemmatization, but that requires a minimum of TFIDF, and that might be a little bit out of scope for this package. I'm not sure how that would adapt to different language domains either.

aaaton commented 5 years ago

Solved in v2.0. Golem now always returns the first alphabetical result in case of multiple to choose from.

If you are reading this and want the "correct" lemmatization I suggest getting all possible results from golem.Lemmas(word string) []string and implement a better guess yourself based on the context or corpus you are working with.