Open Entomy opened 4 years ago
Damerau-Levenshtein edit-distance is probably the best compromise between accuracy and performance for general purpose use (other edit-distance algorithms should still be implemented). However there's two pitfalls when finding a good implementation for this. As a result, lifting an existing implementation isn't likely.
So pretty par-for-the-norm with Comp. Sci. stuff.
These may be useful as far as a good implementation goes.
Hyyrö, H. A Bit-Vector Algorithm for Computing Levenshtein and Damerau Edit Distances. Nord. J. Comput. 2002.
Balhaf, K. Alsmirat, M. et al. Accelerating Levenshtein and Damerau edit distance algorithms using GPU with unified memory. ICICS. 2017.
Damerau-Levenshtein request is here Stringier/Metrics#2
I mistakenly marked this as a good issue for contributors. Implementation of edit-distance algorithms is a good first issue. However, choosing which one provides the default FuzzyEquals behavior is my decision, and will be based on analysis of several alternatives overall performance curves and capabilities.
Currently,
FuzzyEquals
makes use of the Levenshtein edit-distance algorithm. This counts substitutions, insertions, and deletions, but not transpositions.FuzzyEquals()
should take into consideration all four edits.