amjith / fuzzyfinder

Fuzzy Finder implemented in Python
BSD 3-Clause "New" or "Revised" License
370 stars 30 forks source link

Remove unnecessary `?` in regex #3

Closed johshoff closed 8 years ago

johshoff commented 8 years ago

.* will already match nothing, so it's not necessary to make it optional.

It doesn't seem to make a difference in timing. That's not very surprising, since they are probably compiled into the same internal representation. It saves one byte, though! It's also one less thing to wonder about in the code.

johshoff commented 8 years ago

My test setup was bad, so I didn't catch this error. The problem is that the question mark returns a different group even though it matches the same elements. For exampled.*j.*m.*i will return the group 'django_mi', while d.*?j.*?m.*?i will return the group 'django_migrati'. This throws off the ranking algorithm. I'm sure you knew this already, but I certainly learned something :)

On the positive side of my test setup being bad; the timing is actually radically different. A 5x speed improvement without the ? on my test data. Assuming relatively few elements in the list would match, it would be possible to do a second pass with the more expensive regex in case of a match, but probably not worth the readability impact.

amjith commented 8 years ago

It was one of the improvements listed in the original (blog post)[http://blog.amjith.com/fuzzyfinder-in-10-lines-of-python].

It was suggested by @drocco007. If we don't do non-greedy matching then regex defaults to finding the largest matching portion of the string. Which definitely throws off the ranking algorithm.

But I'm glad you took the time to study the code and try to optimize it. :)