beyondgrep / ack2

**ack 2 is no longer being maintained. ack 3 is the latest version.**
https://github.com/beyondgrep/ack3/
Other
1.48k stars 140 forks source link

Unexpected matches for input data from a patch file #661

Closed elfring closed 5 years ago

elfring commented 6 years ago

I have tried the command “printf -- '-\tif\n'|ack '^-\s+[^i]'” out with the version “2.22-1.1”. I get no match which I expect in this use case. But I wonder why matches are displayed when if lines are passed from a patch file instead. Should unwanted characters be also excluded by such a small regular expression here?

petdance commented 6 years ago

I'm sorry, I don't understand the question. What behavior are you seeing that you think is incorrect?

elfring commented 6 years ago

Are you used to the software situation that patch files can indicate source code adjustments for if statements? Should they be not found by the regular expression “^-\s+[^i]” when a concrete file name was passed to the program “ack”?

petdance commented 6 years ago

If I understand correctly, you're trying ack '^-\s+[^i]' file.txt on some file and not getting the matches that you're expecting. Is that correct?

Can you please attach the file that ack is not matching?

elfring commented 6 years ago

Is that correct?

The answer depends on interpretation of the software situation. - I get more data displayed than what I expect so far for one of my patch files (for example) with the mentioned search approach.

petdance commented 6 years ago

Can you please provide

hoelzro commented 6 years ago

Here's a link to the patch file itself: https://patchwork.kernel.org/patch/10228471/raw/

I assume that this command reproduces was @elfring is talking about:

[12:51:56] rob@eridanus /tmp $ sed -ne 323p v2-media-Use-common-error-handling-code-in-20-functions.patch | ack '^-\s+[^i]'
-       if (ret < 0) {

It seems odd that line 323 of that patch file matches that regex, but it actually should match. There are three characters before if in that line: the dash and two tab characters. The regex matches the leading dash, and then it tries to match the two tabs against \s+. This succeeds, but then [^i] tries to match the leading "i" in "if", and fails. So the regex engine backtracks, only applying \s+ to a single tab. Then it tries to match the second tab against [^i], which succeeds, and thus the line matches. You can see this in action here: https://asciinema.org/a/yX2YEG3TOwZpABA2XdGJ1dCic

To fix this, you could make your \s match possessive by using \s++ instead of \s+, so that it doesn't backtrack:

[13:02:24] rob@eridanus /tmp/example $ sed -ne 323p v2-media-Use-common-error-handling-code-in-20-functions.patch | ack '^-\s++[^i]'   
elfring commented 6 years ago

So the regex engine backtracks, only applying \s+ to a single tab.

I admit that I have been too unused to this software behaviour. Can the match colouring be improved (for white-space characters) anyhow?

…, you could make your \s match possessive …

Thanks for your advice.

I am going to take possessive quantifiers (or atomic grouping) better into account.

petdance commented 6 years ago

Can the match colouring be improved (for white-space characters) anyhow?

What specifically would you like to see?

elfring commented 6 years ago
petdance commented 6 years ago

A command example like “printf -- 'xxif\n'|ack '^x+[^i]'” can show a visually more pleasing result.

unpleasing

How could that be made more visually pleasing?

petdance commented 6 years ago

Do you observe also that the background colour is not changed for the matched places when they are “accidentally” tab characters?

No, I don't see that. Please be more specific.

Another data processing option would be to achieve the desired exclusion of “unwanted” text lines by other means.

What are the "unwanted" text lines you're talking about?

elfring commented 6 years ago

How could that be made more visually pleasing?

It seems that you interpreted my feedback in the wrong direction.

No, I don't see that.

I see that indentation is performed without a different background colour.

Which test result do you get for the command example “printf -- '\t\tif\n'|ack '^\s+[^i]'”?

What are the "unwanted" text lines you're talking about?

I am occasionally trying to filter patch code so that I can write better commit messages with determined facts.

hoelzro commented 6 years ago

@petdance I'm guessing what @elfring wants is something like this:

highlighted-ws

It looks like some terminals like urxvt (which is what I'm using) and xterm don't apply background color to tab characters - I generated that screenshot by hacking ack to expand tabs into eight spaces.

elfring commented 6 years ago