beyondgrep / ack2

**ack 2 is no longer being maintained. ack 3 is the latest version.**
https://github.com/beyondgrep/ack3/
Other
1.48k stars 140 forks source link

Regexes that match the empty string send ack into an infinite loop #542

Closed epa closed 5 years ago

epa commented 9 years ago

You can grind ack to a halt with

% ack '.*'

Although .* is not slow for the regexp engine to match, it slows down ack a lot. Is there something ack could do more intelligently?

petdance commented 9 years ago

Why would you want to ack for .* in the first place?

epa commented 9 years ago

Obviously you wouldn't, but it is the simplest example that illustrates the slowdown. I stumbled across this when I had intended to find [.][*] but accidentally typed [.]*. I imagine that fixing the slowdown for .* would also have a beneficial effect for other regexps that are not quite as pathological, but nonetheless slow. But if that's not likely to be the case, feel free to disregard the issue.

petdance commented 9 years ago

I'd like to find a realistic example that is worth pursuing. I don't know why it locks up on .* but it's hard for me to get interested in it in the face of other, real bugs that are out there.

Still, let's leave this open.

hoelzro commented 9 years ago

Did a bit of digging; it's not just slow; it ends up in an infinite loop that consumes all memory in sight. The problem is in print_line_with_options; the coloration logic increases the length of the string, causing the pattern to match something new all the time.

epa commented 9 years ago

Ouch, colorizing the string changes what the regexp matches? Maybe there is a test case for that bug all by itself without using .*.

hoelzro commented 9 years ago

@epa I think .* is special, because it matches both the entire line, but also the empty string.

petdance commented 9 years ago

And it's not just .*, but stuff like b* which I could easily see people doing by mistake.

epa commented 9 years ago

Maybe the first thing ack should do is test the given regexp against the empty string, and if it matches die with an error? That doesn't rule out fixing the underlying code in print_line_with_options of course. But surely nobody deliberately searches for the empty string.

petdance commented 9 years ago

That seems like a good check. If it matches the empty string, then the user has entered something incorrectly. But what would we describe to the user in the error message?

Imagine the user is looking for something like "a_b_c*", for whatever they think that that will do. How do we describe the problem?

epa commented 9 years ago

The regular expression X matches the empty string, so it cannot be used to search for text.

There's probably no message you can write which will explain the problem clearly to someone who doesn't understand regular expressions.

pdl commented 9 years ago

"Maybe the first thing ack should do is test the given regexp against the empty string"

This will fail for (?<=.).* and probably some genuinely realistic cases including zero-width assertions.

Perhaps a better solution would be to determine whether $& is zero-length (by looking at the start and end position matched via the other punctuation variables - $+[0] - $-[0], I believe - to avoid the penalty of $&) - if it is of zero length then do not colourise.

hoelzro commented 9 years ago

@pdl That would be my chosen approach, and a quick test I tried got it out of the highlighting loop. There was another one it was stuck in though; I'm going to investigate further when I have some more time.

petdance commented 9 years ago

Agreed that checking for $& being zero-length is probably a win.

I'd like to see some genuinely realistic cases with zero-width assertions before we say that "We shouldn't forbid them because they might get used."

epa commented 9 years ago

@pdl are you saying there are regexps which do match the empty string but are nonetheless useful to search with?

hoelzro commented 9 years ago

Ok, the bug is fixed as of 80402b4.

petdance commented 9 years ago

Thanks for fixing this, @hoelzro. Can you please put a comment on the check that explains why it's there?

hoelzro commented 9 years ago

@petdance Done!

hoelzro commented 9 years ago

Can we close this?

Gunni commented 9 years ago

Not tested with the latest commit but for me this happened with

ack ''
pplu commented 8 years ago

I've found the same thing with ack $ file. It would look like the same bug, but I just wanted to advise you guys of more ways to trigger the bug