Closed LachlanAndrew closed 11 months ago
This PR looks great. Help with the dictionary is always helpful! I noticed a function was added. Did you mean to add that function (ranked_candidates
)?
Yes and no. I was only trying to request a pull of 816cc2d, but I'm not familiar enough with github and accidentally requested a pull of everything in that branch... However, I was planning to submit ranked_candidates separately.
For now, does github allow you to pull just 816cc2d (and possibly 1eea85c), or should send a new request (or learn how to fix this one)?
I was also thinking of grouping the words in en_exclude.txt into missing spaces, typing errors, spelling errors, words from other languages and OCR errors. That should make it easier to remove words that get put in by accident. If you would prefer me to do that before you pull, I'm happy to.
Github doesn't allow me to easily select part of a PR to accept, or I haven't found it yet. If ranked_candidates
is ready, I can look into that part at the same time. I just wanted to be sure!
As for sorting the en_exclude, I don't know if that is necessary, but thank you for the offer! I just don't know if it would have any useful purpose.
Closing
There are still many words in en.json.gz that are not English words. I've added a few thousand to en_exclude.txt in my fork, and am trying to create a pull request. I'm not sure quite how to do this, so I apologise if I mess it up.