Open Turnerj opened 4 years ago
Not sure if I am talking rubbish, but you may want to consider the single row version https://rosettacode.org/wiki/Levenshtein_distance#C.2B.2B It's discussed here: https://www.youtube.com/watch?v=Cu7Tl7FGigQ
Hey @RobSchoenaker - thanks for the link however Quickenshtein already uses that optimization and a bunch more. If you're curious about some of the exotic optimizations, you can check out this video of me talking about it at dotNETConf last year (https://www.youtube.com/watch?v=JiOYajl2Mds) or any of my blog posts on the subject:
This particular issue is referring to a diagonal calculation approach which allows us to take advantage of SIMD (watch my video or that last blog post if you want to know a bit about SIMD). Basically diagonally calculating allows us to remove the dependency issue when calculating multiple cells at once. We do require 2x the memory that the traditional single row needs but for long strings, we absolutely crush it in speed.
I do plan to have a few more blog posts about Levenshtein Distance in the future too as there are definitely some cool things I'd like to check out with it.
@Turnerj thx for the quick response. I see my comment did not make any sense :-) Will check out the dotNETConf things!
No problem mate! Hopefully you find some of the links I've given interesting.
A few points:
Preliminary calculations on performance for single threaded improvement (take these with a grain of salt, they all don't calculate correctly):
For comparison, it takes about 26.8 ms to calculate it currently with the regular diagonal calculation. This means at best, single threaded could have a 12% improvement however it is likely to be closer to 8% for the "correct" solution.
No idea about how the threading code would perform though.