fujimotos / polyleven

Fast Levenshtein Distance Library for Python 3
https://ceptord.net
MIT License
80 stars 10 forks source link

improve performance of mbleven #5

Closed Nick-Mazuk closed 2 years ago

Nick-Mazuk commented 2 years ago

Short circuit early if the strings are identical or the number of edits is low.

See https://github.com/fujimotos/polyleven/issues/4 for more details.

Reviewers: please double-check and test this code carefully. This is the first time I've written in C. I'm also unsure how to run the tests so I've not run them.

fujimotos commented 2 years ago

Short circuit early if the strings are identical or the number of edits is low.

So I pulled @Nick-Mazuk's patch this evening and run the basic regression test against it. Basically everything was green:

test_ascii (__main__.TestPattern) ... ok
test_ascii_with_k (__main__.TestPattern) ... ok
test_long (__main__.TestPattern) ... ok
test_special (__main__.TestPattern) ... ok
test_unicode (__main__.TestPattern) ... ok
test_unicode_with_k (__main__.TestPattern) ... ok

Reviewers: please double-check and test this code carefully. This is the first time I've written in C. I'm also unsure how to run the tests so I've not run them.

I will test more on this patch this weekend (I want to test it with more exotic cases). Please wait for me.

fujimotos commented 2 years ago

Merged via 28d90d501a675d83550a8d59a12827856e9ed07d

fujimotos commented 2 years ago

@Nick-Mazuk OK. I can confirm that your patch is correct.

I merged your improvement into master in 28d90d5, and also added you as the author to LICENSE.txt:

https://github.com/fujimotos/polyleven/blob/master/LICENSE

Feel free to ask me if anything is unclear (Note: I can also list your email in LICENSE.txt if you have one; I couldn't find out your mail address in your Git commit)

Nick-Mazuk commented 2 years ago

Thanks @fujimotos! It's fine with me that my email isn't in LICENSE.txt as I'd prefer to keep it private.