barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
701 stars 103 forks source link

More words for en_exclude.txt #154

Closed LachlanAndrew closed 9 months ago

LachlanAndrew commented 1 year ago

There are still many words in en.json.gz that are not English words. I've added a few thousand to en_exclude.txt in my fork, and am trying to create a pull request. I'm not sure quite how to do this, so I apologise if I mess it up.

barrust commented 1 year ago

This PR looks great. Help with the dictionary is always helpful! I noticed a function was added. Did you mean to add that function (ranked_candidates)?

LachlanAndrew commented 1 year ago

Yes and no. I was only trying to request a pull of 816cc2d, but I'm not familiar enough with github and accidentally requested a pull of everything in that branch... However, I was planning to submit ranked_candidates separately.

For now, does github allow you to pull just 816cc2d (and possibly 1eea85c), or should send a new request (or learn how to fix this one)?

I was also thinking of grouping the words in en_exclude.txt into missing spaces, typing errors, spelling errors, words from other languages and OCR errors. That should make it easier to remove words that get put in by accident. If you would prefer me to do that before you pull, I'm happy to.

barrust commented 1 year ago

Github doesn't allow me to easily select part of a PR to accept, or I haven't found it yet. If ranked_candidates is ready, I can look into that part at the same time. I just wanted to be sure!

As for sorting the en_exclude, I don't know if that is necessary, but thank you for the offer! I just don't know if it would have any useful purpose.

barrust commented 9 months ago

Closing