barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
714 stars 164 forks source link

Addition of support for a distance of 1 #17

Closed mrjamesriley closed 6 years ago

mrjamesriley commented 6 years ago

For our purposes, it was apparent that the spellchecker was too 'generous' in the suggestions offered for words - due to the edit distance of two. This meant that words which we wouldn't deem to be typos, frequently being identified as such - or we'd have candidates offered which were too far 'removed' from what we'd expected. It also happens to be that the performance was a fair bit slower than the real-time use we were aiming for.

When set to an edit distance of 1, we get the performance and tighter 'accuracy' that works well for us. Thus this pull request is to allow for the setting of the distance during the SpellChecker initialisation, with the default set to 2 for backwords compatability.

A quick benchmark reveals the increased performance of the edit distance of 1, where appropriate of course:

word: 'mrjamesriley'

Looking for correction with distance of two (5 times):
15.7 seconds

Looking for correction with distance of one (5 times):
0.01 seconds
coveralls commented 6 years ago

Coverage Status

Coverage increased (+0.01%) to 98.425% when pulling db4edafb841328099dce1a0b16d99f7328f177b0 on mrjamesriley:master into 2bdb3059a9978df1562692b6a9859e67e5c715ed on barrust:master.