barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
694 stars 101 forks source link

Damerau-Levenshtein metric #83

Closed PeterPirog closed 3 years ago

PeterPirog commented 3 years ago

Is possibile to add optionally Damerau-Levenshtein metric?

barrust commented 3 years ago

What do you mean by adding the Damerau-Levenshtein metric? Technically, the library does the following changes, up to edit distance of 2:

By edit distance of 2, that means, in this case, up to two of these actions.

PeterPirog commented 3 years ago

As I understand basic operations (with distance 1) for Levenshtein metric are insertion, deletion and replacement but for Damerau-Levenshtein additionally basic operation is transposition too. In some practical cases I observed that transposition of letter is common:

weird or werid strength or strenght

These examples above has distance = 1 in Damerau-Levenshtein metric and distance =2 in Levenshtein metric, so it's nice to add option to switch between Levenshtein metric or Damerau-Levenshtein metric.

barrust commented 3 years ago

If you look at the code here you can see that we are already treating transposes as edit distance of 1. Are you not seeing transposition as an edit distance of 1?

PeterPirog commented 3 years ago

Thank you for the info. Maybe I did something wrong in my code.

barrust commented 3 years ago

Sounds good. If you are seeing it not do transpositions as distance 1, please let me know! If there is nothing else, I will close this ticket out in the next few days.

barrust commented 3 years ago

Using this test case, it looks like the transposition is working as expected.

from spellchecker import SpellChecker

spell = SpellChecker(distance=1)
misspelled = spell.unknown(["trhee"])
print(misspelled)  # show it isn't correct
for word in misspelled:
    print(spell.edit_distance_1(word))
    print(spell.correction(word))

Please reopen this issue if you have an example of it not working as expected. Thanks!