barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
712 stars 164 forks source link

countries causing errors in 0.8.0 #168

Closed tomkralidis closed 10 months ago

tomkralidis commented 10 months ago

Hi: thanks much for this valuable tool! Moving to 0.8.0 causes issues when spellchecking text with country names.

Example using 0.7.3:

>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown(['Austria'])
set()

Example using 0.8.0:

>>> from spellchecker import SpellChecker
>>> s = SpellChecker()
>>> s.unknown(['Austria'])
{'austria'}

I am guessing #165 resulted in these changes, but not sure exactly where or why.

Is this expected behaviour? Thanks in advance.

barrust commented 10 months ago

Yes, it looks like the country names did not make it into the new dictionaries. You can add them locally or we could add them into the en_include.txt file to ensure that they get added in future releases of the dictionaries.

tomkralidis commented 10 months ago

Thanks @barrust. PR in #169 for consideration.

barrust commented 10 months ago

Thank you! I will look at it shortly. I actually started a new branch locally to add country names in English, Spanish, French, and Italian along with rebuilding the dictionaries.

I might use the one I am working on as it includes splitting the words, etc. built into the build_dictionary.py script.

barrust commented 10 months ago

published version 0.8.1 with the updated dictionaries