clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.76k stars 1.58k forks source link

lemma bug for some words end with 's' #221

Closed Steven-AA closed 6 years ago

Steven-AA commented 6 years ago

image

image

Lguyogiro commented 6 years ago

lemma is specifically meant to return the base form of Verbs. Since these words aren't Verbs, they are treated as unseen Verbs, and undergo the default lemmatization. See https://www.clips.uantwerpen.be/pages/pattern-en#conjugation

Steven-AA commented 6 years ago

thanks a lot

BTW, is there a easy way to get the base form of any word?

Lguyogiro commented 6 years ago

I don't think so. You can use lemma for getting the base form of a verb, and singularize to get the singular form of a Noun (https://www.clips.uantwerpen.be/pages/pattern-en#pluralization)...You could apply the Porter stemmer to all of your words to get their stems using the stem function (https://www.clips.uantwerpen.be/pages/pattern-vector#wordcount)

Steven-AA commented 6 years ago

Thanks.