barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
691 stars 101 forks source link

countries causing errors in 0.8.0 #168

Closed tomkralidis closed 6 months ago

tomkralidis commented 6 months ago

Hi: thanks much for this valuable tool! Moving to 0.8.0 causes issues when spellchecking text with country names.

Example using 0.7.3:

>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown(['Austria'])
set()

Example using 0.8.0:

>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown(['Austria'])
{'austria'}

I am guessing #165 resulted in these changes, but not sure exactly where or why.

Is this expected behaviour? Thanks in advance.

barrust commented 6 months ago

Yes, it looks like the country names did not make it into the new dictionaries. You can add them locally or we could add them into the en_include.txt file to ensure that they get added in future releases of the dictionaries.

tomkralidis commented 6 months ago

Thanks @barrust. PR in #169 for consideration.

barrust commented 6 months ago

Thank you! I will look at it shortly. I actually started a new branch locally to add country names in English, Spanish, French, and Italian along with rebuilding the dictionaries.

I might use the one I am working on as it includes splitting the words, etc. built into the build_dictionary.py script.

barrust commented 6 months ago

published version 0.8.1 with the updated dictionaries