barrust / pyspellchecker

Pure Python Spell Checking http://pyspellchecker.readthedocs.io/en/latest/
MIT License
714 stars 164 forks source link

Use Trie for spelling check #39

Closed diegoepaezm closed 5 years ago

diegoepaezm commented 5 years ago

The spelling checker takes a lot of time checking the spelling of words. Especially if a word is not in the dictionary it takes even longer, which can make spelling check of several words a really long process. Besides the current approach only allows only search up to 2 in distance of levenshtein. Thus it would be convenient to:

  1. Use dynamic programming to calculate edit distances in general, to allow any distance (distances greater than 2 will only make sense when using a trie approach, see point 2).
  2. Use a trie as a data structure for the dictionary to enable faster lookups (instead of computing all words within 1 or 2 edit distances).
barrust commented 5 years ago

@diegoepaezm, I would love to see an example of this that allows for all the different types of misspellings! A quick and dirty set of example code with mini tests would be great!

DiegoEPaez commented 5 years ago

Hmm I have an example in Java. But you can consult the wikipedia page to find out how to use dynamic programming to calculate the levenshtein distance between two strings:

https://en.wikipedia.org/wiki/Levenshtein_distance

And I guess an example with a Trie would be like this one: https://github.com/umbertogriffo/Trie