Improving the Diagonal Calculation

Turnerj commented 4 years ago

A few points:

Diagonal calculation isn't effective with its use of caching vector-sized strings, specifically the target vector which needs to be flipped. It flips it multiple times because of how the data iterates. If the diagonal calculation could be both diagonal within a column the width of the vector, it would eliminate this.
Threading doesn't make use of the diagonal calculation because the boundaries cannot get the correct data easily (both the back boundary for the next thread and the forward boundary for the current thread). Using the opposite of the technique above, instead of diagonal within a column the width of the vector, do diagonal in the row the width of the vector.

Preliminary calculations on performance for single threaded improvement (take these with a grain of salt, they all don't calculate correctly):

No target lookups and no shuffles/permutations aka. Process Column (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Quickenshtein	23.67 ms	0.125 ms	0.117 ms	-	-	-	-

No source lookups aka. Process Row (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Quickenshtein	24.50 ms	0.143 ms	0.134 ms	-	-	-	40 B

No source or target lookups (INVALID)	Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
Quickenshtein	21.20 ms	0.138 ms	0.130 ms	-	-	-	40 B

For comparison, it takes about 26.8 ms to calculate it currently with the regular diagonal calculation. This means at best, single threaded could have a 12% improvement however it is likely to be closer to 8% for the "correct" solution.

No idea about how the threading code would perform though.

RobSchoenaker commented 3 years ago

Not sure if I am talking rubbish, but you may want to consider the single row version https://rosettacode.org/wiki/Levenshtein_distance#C.2B.2B It's discussed here: https://www.youtube.com/watch?v=Cu7Tl7FGigQ

Turnerj commented 3 years ago

Hey @RobSchoenaker - thanks for the link however Quickenshtein already uses that optimization and a bunch more. If you're curious about some of the exotic optimizations, you can check out this video of me talking about it at dotNETConf last year (https://www.youtube.com/watch?v=JiOYajl2Mds) or any of my blog posts on the subject:

https://turnerj.com/blog/levenshtein-distance-part-1-what-is-it (This is all covered in the video you linked)
https://turnerj.com/blog/levenshtein-distance-part-2-gotta-go-fast (This covers the single row optimization in the video you linked plus trimming which wasn't covered)
https://turnerj.com/blog/levenshtein-distance-part-3-optimize-everything (This is a bunch more optimizations)
https://turnerj.com/blog/levenshtein-distance-with-simd (This covers some really exotic optimizations)

This particular issue is referring to a diagonal calculation approach which allows us to take advantage of SIMD (watch my video or that last blog post if you want to know a bit about SIMD). Basically diagonally calculating allows us to remove the dependency issue when calculating multiple cells at once. We do require 2x the memory that the traditional single row needs but for long strings, we absolutely crush it in speed.

I do plan to have a few more blog posts about Levenshtein Distance in the future too as there are definitely some cool things I'd like to check out with it.

RobSchoenaker commented 3 years ago

@Turnerj thx for the quick response. I see my comment did not make any sense :-) Will check out the dotNETConf things!

Turnerj commented 3 years ago

No problem mate! Hopefully you find some of the links I've given interesting.

Turnerj / Quickenshtein

Improving the Diagonal Calculation #21