barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
694 stars 101 forks source link

Update dictionaries #78

Closed barrust closed 3 years ago

barrust commented 3 years ago

Update dictionaries by documenting a script to automate the task. This should make it easier to update the dictionaries in the future.

Resolves #75 - Document how dictionaries are created, etc. Resolves #65 - Fix for cancion != canción Resolves #56 - Allow contractions, etc in en dictionary

codecov-io commented 3 years ago

Codecov Report

Merging #78 (6e7c849) into master (a1da70c) will decrease coverage by 0.00%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #78      +/-   ##
==========================================
- Coverage   99.25%   99.25%   -0.01%     
==========================================
  Files           4        4              
  Lines         269      268       -1     
==========================================
- Hits          267      266       -1     
  Misses          2        2              
Impacted Files Coverage Δ
spellchecker/info.py 100.00% <100.00%> (ø)
spellchecker/utils.py 100.00% <100.00%> (ø)
gehtho commented 3 years ago

@barrust , thanks for this project - by accident I found a typo in the known en.json. The issue is this: "occurence": 15, The correct spelling with double r is also present, a few lines below. The fix would be simply removing the line. Since this PR is already open, do you need another issue report for this? Thanks, Thomas

barrust commented 3 years ago

No, we can add it here. Is this in the latest version of the dictionary? Also, is there an easy rule we can add to catch more of these types of issues? If not, I can add it to the en_exclude.txt file to make sure it isn't re-occurring in the future.

cmaureir commented 3 years ago

Thanks @barrust for the update! :tada: