fujimotos / polyleven

Fast Levenshtein Distance Library for Python 3
https://ceptord.net
MIT License
80 stars 10 forks source link

Potential performance improvement #4

Closed Nick-Mazuk closed 2 years ago

Nick-Mazuk commented 3 years ago

I believe we can speed up polyleven when it uses the mbleven algorithm quite a bit. In my testing on a Rust version I created, it can lead to a 6x performance improvement in some of the slower cases by allowing for early short-circuits. It all comes down to this while loop inside the mbleven_ascii function.

 while (MBLEVEN_MATRIX[pos]) {
        m = MBLEVEN_MATRIX[pos++];
        i = j = c = 0;
        while (i < len1 && j < len2) {
            if (s1[i] != s2[j]) {
                c++;
                if (!m) break;
                if (m & 1) i++;
                if (m & 2) j++;
                m >>= 2;
            } else {
                i++;
                j++;
            }
        }
        c += (len1 - i) + (len2 - j);
        r = MIN(r, c);
        // add improvements here
    }

At the end of the outer while loop, we can short circuit if:

Hence, you can just add

if (r < 2) {
    return r;
}

This should apply to both the ASCII and strbuf cases.

fujimotos commented 2 years ago

@Nick-Mazuk That's great. Can you post a PR so that I can merge your contribution?

Once done, I will add you as an author to the LICENSE file:

https://github.com/fujimotos/polyleven/blob/master/LICENSE

Nick-Mazuk commented 2 years ago

@fujimotos, PR created.

fujimotos commented 2 years ago

This issue has been merged in master.