clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.74k stars 1.58k forks source link

Incorrect plural forms of words ending with 'us'. #272

Open dsaw opened 5 years ago

dsaw commented 5 years ago

Some words which end with -us that have plural forms with -i are giving incorrect results.

>>> word_list = ['focus','cactus','fungus','nucleus','syllabus']
>>> for w in word_list:
...     print(pluralize(w))
...
foci
cactuss
fungi
nucleuss
syllabuss

Should be cacti, nuclei and syllabi respectively. There are enough words of this sort that can form a group. The singularize function also converts incorrectly.

>>> singularize('fungi')
'fungi'
k4ni5h commented 5 years ago

Include these words in plural_categories in "us-i*" will solve this issue.

word_list = ['focus','cactus','fungus','nucleus','syllabus'] [pluralize(a) for a in word_list] ['foci', 'cacti', 'fungi', 'nuclei', 'syllabi']