clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.74k stars 1.58k forks source link

Issue with Pluralization in German Language for Umlauts #335

Open peymansf2000 opened 10 months ago

peymansf2000 commented 10 months ago

I hope this message finds you well. I've identified an issue with the pluralization function in your GitHub repository when dealing with German words containing umlauts. The problem becomes apparent in cases where the plural form should include umlauts, but the function does not handle them correctly.

Here are specific examples:

word1 = 'Frucht'
pluralize(word1)
#should be 'Früchte'
Out[8]: 'Fruchte'

word2 = 'Apfel'
pluralize(word2)
#should be 'Äpfel'
Out[9]: 'Apfel'

word3 = 'Arzt'
pluralize(word3)
#should be 'Ärzte'
Out[10]: 'Arzte'

word4 = 'Anbaufläche'
pluralize(word4)
#it's true
Out[11]: 'Anbauflächen'

word5 = 'Fläche'
pluralize(word5)
# it's true
Out[12]: 'Flächen'

word6 = 'Stuhl'
pluralize(word6)
#should be 'Stühle'
Out[13]: 'Stuhle'

word7 = 'Flüssigkeit'
pluralize(word7)
#should be 'Flüssigkeiten'
#no 'en'!why!
Out[14]: 'Flüssigkeit'

word8 = 'Röhre'
pluralize(word8)
#it's true
Out[15]: 'Röhren'

word9 = 'Höhle'
pluralize(word9)
#it's true
Out[16]: 'Höhlen'

word10 = 'Ökonomie'
pluralize(word10)
#it's true
Out[17]: 'Ökonomien'

word11 = 'Loch'
pluralize(word11)
#should be 'Löcher'
Out[18]: 'Loche'