barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
691 stars 101 forks source link

Common misspellings are not included #124

Closed aMiss-aWry closed 2 years ago

aMiss-aWry commented 2 years ago

I understand the word frequency method doesn't do so well with common misspellings (since it is likely the source data is contaminated with common typos) but is there any way to add to a 'blacklist' of common misspellings, easily sourced from: https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines

ex. taht tiem are both not considered misspellings by pyspellchecker.

aMiss-aWry commented 2 years ago

I found a workaround which was pretty straightforward - making wikipedia's list of common misspellings into a dictionary and checking through that afterwards. It would be nice if it was incorporated into pyspellchecker itself, though.

barrust commented 2 years ago

There is a way to fix these issues in future builds of the dictionary. Words added to scripts/data/{lang}_exclude.txt will remove those words from the next build of the dictionaries.

As always, PR's or code to generate the list of common typos to add to this file is always welcome. Thanks!