The minimum edit distance code in the repository is not quite correct as end might be bounded by k+1 and thus the condition j>=end-1 doesn't mean it's the last column. And we should save the information in this column too.
for (long k=0; k < nb + mb - 1; k++) {
long start = (k < nb) ? 0 : k - nb + 1;
long end = std::min(k+1, mb);
parlay::parallel_for(start, end, [&] (long j) {
auto i = k - j;
int io = i*bsize; int jo = j*bsize;
if (j < end-1) db[j] = c[jo + bsize - 1];
do_block(ra.begin() + io, rb.begin() + io,
s1.begin() + io, std::min(bsize, n - io),
c.begin() + jo, c.begin() + jo,
s2.begin() + jo, std::min(bsize, m - jo),
(j == 0) ? io : da[j-1]);}, 1);
std::swap(ra, rb);
std::swap(da, db);
}
To prove it, we can first set bsize=1 to make the bug easier to trigger, then set s1={0, 1} and s2={0, 0}. The edit distance should be 2 (considering the original code does not allow substitution), but it returns 1.
Hi developers,
The minimum edit distance code in the repository is not quite correct as
end
might be bounded byk+1
and thus the conditionj>=end-1
doesn't mean it's the last column. And we should save the information in this column too.To prove it, we can first set
bsize=1
to make the bug easier to trigger, then set s1={0, 1} and s2={0, 0}. The edit distance should be 2 (considering the original code does not allow substitution), but it returns 1.The second commit of this PR allows substitutions, which is common practice according to Wikipedia: https://en.wikipedia.org/wiki/Edit_distance