cmuparlay / parlaylib

A Toolkit for Programming Parallel Algorithms on Shared-Memory Multicore Machines
MIT License
321 stars 60 forks source link

Fix a bug and allow substitution in edit distance #32

Closed Akatsukis closed 1 year ago

Akatsukis commented 1 year ago

Hi developers,

The minimum edit distance code in the repository is not quite correct as end might be bounded by k+1 and thus the condition j>=end-1 doesn't mean it's the last column. And we should save the information in this column too.

  for (long k=0; k < nb + mb - 1; k++) {
    long start = (k < nb) ? 0 : k - nb + 1;
    long end = std::min(k+1, mb);
    parlay::parallel_for(start, end, [&] (long j) {
      auto i = k - j;
      int io = i*bsize; int jo = j*bsize;
      if (j < end-1) db[j] = c[jo + bsize - 1];
      do_block(ra.begin() + io, rb.begin() + io,
               s1.begin() + io, std::min(bsize, n - io),
               c.begin() + jo, c.begin() + jo,
               s2.begin() + jo, std::min(bsize, m - jo),
               (j == 0) ? io : da[j-1]);}, 1);
    std::swap(ra, rb);
    std::swap(da, db);
  }

To prove it, we can first set bsize=1 to make the bug easier to trigger, then set s1={0, 1} and s2={0, 0}. The edit distance should be 2 (considering the original code does not allow substitution), but it returns 1.

The second commit of this PR allows substitutions, which is common practice according to Wikipedia: https://en.wikipedia.org/wiki/Edit_distance