Closed Skarlit closed 8 years ago
if (pure_levenshtein == 0 && i >= block && j >= block && swap1 == 1 && swap2 == 1){
transp = d[(j - block * 2) * sl + i - block * 2] + cost + block -1;
if (transp < min) min = transp;
block = 0;
} else if (block == 1) {
subs = d[(j-1)*sl + i - 1] + cost;
if (subs < min) min = subs;
}
When i = block and j = block, d[(j - block * 2) * sl + i - block * 2] = d[ - block * sl - block] in particular, when i = j = block = 1, it gives huge negative number.
Can you add test to spec/files/damerau_levenshtein_test.txt which illustrates the problem?
Sorry for the late reply. I am not able to come up with an example currently because the bug happened randomly but consistently when I scraped a table with 170k entries with two calls to the methods under a loop. The huge negative numbers occur randomly with different entries everytime (when you test those entries individually it works fine). We logged out the indices and they occasionally turned negative. After the fix, everything works fine. I will follow up in the future if I can reproduce the problem with small amount of data.
Thanks Ran, 'random' problems are the hardest to test, but also very important to test. If you can send me a file that generates such a problem, I will try to figure out how to make a test with smallest amount of data.
I am able to reproduce the issue using Ruby version 2.1.2 with the following code snippet:
require 'damerau-levenshtein'
100000.times do
distance = DamerauLevenshtein.distance('aaaa', 'aaaa', 1, 2)
puts "Negative Distance: #{distance}" if distance < 0
end
Oddly enough, it doesn't return negative distances in Ruby version 1.9.2.
The following data will have one random instance of large negative number under ruby 1.9.2. Bug.zip
Added to master and created 1.1.1 version with this change
see https://wiki.csiro.au/display/taxamatch/The+MDLD+(Modified+Damerau-Levenshtein+Distance)+Algorithm