Closed epa closed 5 years ago
Why would you want to ack for .*
in the first place?
Obviously you wouldn't, but it is the simplest example that illustrates the slowdown. I stumbled across this when I had intended to find [.][*]
but accidentally typed [.]*
. I imagine that fixing the slowdown for .*
would also have a beneficial effect for other regexps that are not quite as pathological, but nonetheless slow. But if that's not likely to be the case, feel free to disregard the issue.
I'd like to find a realistic example that is worth pursuing. I don't know why it locks up on .*
but it's hard for me to get interested in it in the face of other, real bugs that are out there.
Still, let's leave this open.
Did a bit of digging; it's not just slow; it ends up in an infinite loop that consumes all memory in sight. The problem is in print_line_with_options
; the coloration logic increases the length of the string, causing the pattern to match something new all the time.
Ouch, colorizing the string changes what the regexp matches? Maybe there is a test case for that bug all by itself without using .*
.
@epa I think .*
is special, because it matches both the entire line, but also the empty string.
And it's not just .*
, but stuff like b*
which I could easily see people doing by mistake.
Maybe the first thing ack should do is test the given regexp against the empty string, and if it matches die with an error? That doesn't rule out fixing the underlying code in print_line_with_options
of course. But surely nobody deliberately searches for the empty string.
That seems like a good check. If it matches the empty string, then the user has entered something incorrectly. But what would we describe to the user in the error message?
Imagine the user is looking for something like "a_b_c*", for whatever they think that that will do. How do we describe the problem?
The regular expression X matches the empty string, so it cannot be used to search for text.
There's probably no message you can write which will explain the problem clearly to someone who doesn't understand regular expressions.
"Maybe the first thing ack should do is test the given regexp against the empty string"
This will fail for (?<=.).*
and probably some genuinely realistic cases including zero-width assertions.
Perhaps a better solution would be to determine whether $&
is zero-length (by looking at the start and end position matched via the other punctuation variables - $+[0] - $-[0]
, I believe - to avoid the penalty of $&
) - if it is of zero length then do not colourise.
@pdl That would be my chosen approach, and a quick test I tried got it out of the highlighting loop. There was another one it was stuck in though; I'm going to investigate further when I have some more time.
Agreed that checking for $&
being zero-length is probably a win.
I'd like to see some genuinely realistic cases with zero-width assertions before we say that "We shouldn't forbid them because they might get used."
@pdl are you saying there are regexps which do match the empty string but are nonetheless useful to search with?
Ok, the bug is fixed as of 80402b4.
Thanks for fixing this, @hoelzro. Can you please put a comment on the check that explains why it's there?
@petdance Done!
Can we close this?
Not tested with the latest commit but for me this happened with
ack ''
I've found the same thing with ack $ file
. It would look like the same bug, but I just wanted to advise you guys of more ways to trigger the bug
You can grind ack to a halt with
% ack '.*'
Although
.*
is not slow for the regexp engine to match, it slows down ack a lot. Is there something ack could do more intelligently?