Open FeepingCreature opened 5 years ago
rename trie to tree. it's a tree.
On reflection, the histogram part is kind of unnecessary to the algorithm. I'll remove that probably.
Removed. It's even a lot faster to build now! (Maybe three or four times faster.)
It's a trie now.
It's another 25% faster! I think I'll stop here, the code is getting a bit ugly.
great, that would fix #9
Massively speed up the
correct
function.correct
is slow because it callsedit1
twice recursive. Especially for long words, this can lead to low millions of words to generate and test.Instead: build a trie of letter histograms for every word in the dictionary. Then, add a functionfindSimilarWords
to find words whose histogram is similar to the one provided plus some changes. Use this to find candidate words for an edit distance of 2 (histogram distance of 4; two replaces). Then do a bidirectional comparison to find the words that actually match.Instead: build a recursive
treetrie of all the words in the dictionary. Then, add a functionfindSimilarWords
that walks the tree using a provided word as a guide, making changes during iteration, and aborting when the current set of changes leads to a dead end.Tree initialization is lazily done once
correct
is called. It takesabout two secondsless than a secondmaybe 400ms. After that,correct
is effectively instant (~10ms on en_US).Further ideas for improvements: a proper trie data structure could be used for performance.Done!