barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
691 stars 101 forks source link

Incorrect word "thiss" in English dictionary #148

Closed stephencawood closed 1 year ago

stephencawood commented 1 year ago

The text "thiss" appears in the English dictionary. It's clearly not an English word. It's a company (or product) name, but not an English word.

Line 62278 in en.json: "thiss": 79,

BTW - Is this the sort of thing that's open to community contributions?

barrust commented 1 year ago

Yes, I am hopeful that the community, if they find this package useful, will help maintain and improve the library. So pull requests are very much appreciated.

As for this particular word, that is a good catch. In the scripts/data folder of the repository there are several files that allow for ensuring words are present (*_include.txt) or excluded (*_exclude.txt). Adding the word to the exclude file will ensure that it is removed during the next build of the dictionaries.

Hopefully this helps and if there is a systematic thing to check, the scripts/build_dictionary.py file can be used to systematically check for and remove incorrect patterns.

stephencawood commented 1 year ago

Thanks for the reply. That whole section needs work actually: "thisa": 59, "thisfor": 63, "thisis": 779, "thisisit": 92, "thisismy": 89, "thisjob": 52, "thisjust": 67, "thisl": 61, "thiss": 79,

barrust commented 1 year ago

closed by PR #149

Thanks!