Closed elfring closed 5 years ago
I'm sorry, I don't understand the question. What behavior are you seeing that you think is incorrect?
Are you used to the software situation that patch files can indicate source code adjustments for if statements?
Should they be not found by the regular expression “^-\s+[^i]
” when a concrete file name was passed to the program “ack”?
If I understand correctly, you're trying ack '^-\s+[^i]' file.txt
on some file and not getting the matches that you're expecting. Is that correct?
Can you please attach the file that ack is not matching?
Is that correct?
The answer depends on interpretation of the software situation. - I get more data displayed than what I expect so far for one of my patch files (for example) with the mentioned search approach.
Can you please provide
Here's a link to the patch file itself: https://patchwork.kernel.org/patch/10228471/raw/
I assume that this command reproduces was @elfring is talking about:
[12:51:56] rob@eridanus /tmp $ sed -ne 323p v2-media-Use-common-error-handling-code-in-20-functions.patch | ack '^-\s+[^i]'
- if (ret < 0) {
It seems odd that line 323 of that patch file matches that regex, but it actually should match. There are three characters before if
in that line: the dash and two tab characters. The regex matches the leading dash, and then it tries to match the two tabs against \s+
. This succeeds, but then [^i]
tries to match the leading "i" in "if", and fails. So the regex engine backtracks, only applying \s+
to a single tab. Then it tries to match the second tab against [^i]
, which succeeds, and thus the line matches. You can see this in action here: https://asciinema.org/a/yX2YEG3TOwZpABA2XdGJ1dCic
To fix this, you could make your \s
match possessive by using \s++
instead of \s+
, so that it doesn't backtrack:
[13:02:24] rob@eridanus /tmp/example $ sed -ne 323p v2-media-Use-common-error-handling-code-in-20-functions.patch | ack '^-\s++[^i]'
So the regex engine backtracks, only applying
\s+
to a single tab.
I admit that I have been too unused to this software behaviour. Can the match colouring be improved (for white-space characters) anyhow?
…, you could make your
\s
match possessive …
Thanks for your advice.
I am going to take possessive quantifiers (or atomic grouping) better into account.
Can the match colouring be improved (for white-space characters) anyhow?
What specifically would you like to see?
A command example like “printf -- 'xxif\n'|ack '^x+[^i]'
” can show a visually more pleasing result.
Do you observe also that the background colour is not changed for the matched places when they are “accidentally” tab characters?
A command example like “printf -- 'xxif\n'|ack '^x+[^i]'” can show a visually more pleasing result.
How could that be made more visually pleasing?
Do you observe also that the background colour is not changed for the matched places when they are “accidentally” tab characters?
No, I don't see that. Please be more specific.
Another data processing option would be to achieve the desired exclusion of “unwanted” text lines by other means.
What are the "unwanted" text lines you're talking about?
How could that be made more visually pleasing?
It seems that you interpreted my feedback in the wrong direction.
No, I don't see that.
I see that indentation is performed without a different background colour.
Which test result do you get for the command example “printf -- '\t\tif\n'|ack '^\s+[^i]'
”?
What are the "unwanted" text lines you're talking about?
I am occasionally trying to filter patch code so that I can write better commit messages with determined facts.
@petdance I'm guessing what @elfring wants is something like this:
It looks like some terminals like urxvt (which is what I'm using) and xterm don't apply background color to tab characters - I generated that screenshot by hacking ack to expand tabs into eight spaces.
I have tried the command “
printf -- '-\tif\n'|ack '^-\s+[^i]'
” out with the version “2.22-1.1”. I get no match which I expect in this use case. But I wonder why matches are displayed when if lines are passed from a patch file instead. Should unwanted characters be also excluded by such a small regular expression here?