beyondgrep / ack3

ack is a grep-like search tool optimized for source code.
https://beyondgrep.com/
Other
703 stars 66 forks source link

Problem with '--color-match' when using '*' or '?' (asterisk or question mark) #342

Open biburepo opened 3 years ago

biburepo commented 3 years ago

Take this simple file "test.txt" :

1 file found?
2 files found?
24 files found?
No files found.

And this simple query :

ack 's ' --color-match='bold rgb500 on_rgb550' 'test.txt'

which gives me line 2/3/4 from "test.txt" with "s " nicely colored bold-red on light-yellow.

Next query :

ack '(s )+' --color-match='bold rgb500 on_rgb550' 'test.txt'

the same, OK.

But :

ack '(s )*' --color-match='bold rgb500 on_rgb550' 'test.txt'

gives me 4 lines (as expected) BUT NO COLORS

Same with :

ack '(s )?' --color-match='bold rgb500 on_rgb550' 'test.txt'

NOTE With egrep all queries work as expected. (setting : export GREP_COLORS='ms=1;38;5;196;48;5;227')

egrep 's ' test.txt
egrep '(s )+' test.txt
egrep '(s )*' test.txt
egrep '(s )?' test.txt

BTW RipGrep (rg) works as expected. same as egrep. However Silver Searcher (ag) has even more complications, but this is not the place to dive deeper in that area.

n1vux commented 3 years ago

I get the same results, including on branch dev HEAD.

Additionally, ack '(?:s ){0,1}' --color-match='bold rgb500 on_rgb550' 'test.txt' does not colorize but ack '(?:s ){1,1}' --color-match='bold rgb500 on_rgb550' 'test.txt' does colorize, which i would infer means its the fact of optionality and not the syntax of the optionality nor the capturing of the parens that is interfering with highlighting/colorizing.

I'd have expected Greedy-optional ?+ or *+ to have colorized/highlighted, but no it doesn't.

Also, it's not --color-match specific; normal default highlighting fails with pattern that can match 0 chars even when it matches more. Compare ack '(?:s)? ' --color-match='bold rgb500 on_rgb550' 'test.txt' # colors ack '(?:s)?' --color-match='bold rgb500 on_rgb550' 'test.txt'# no colors

Hence, I infer (hypothesize) that what is killing the color/highlight is that the pattern can match 0 characters, so 0 are what are colorized?

Perhaps the highlighting code's use of the Pattern needs to be forced greedy if it ends with an optional quantifier that is not explicitly greedy or non-greedy quantifier-modifier. Ugh, that'd probably require walking the compiled RE AST to do correctly.

n1vux commented 3 years ago

This is arguably (almost) correct behavior. an RE that can match any line by matching 0 or more characters arguably should not only display the entirety of every file but highlight exactly 0 characters on each.

What would be more correct is if did highlight the longest match of an explicitly greedy RE that could still match 0 chars. Failing to do that is still a bug even if the original example is correct behavior.

n1vux commented 3 years ago

alternative hypothesis refuted -

It is not the case that a null sequence has been (almost correctly) VT100 highlighted; there is not an empty highlighted string ^[[30;43m^[[0m sequence in the output.

petdance commented 3 years ago

Let's simplify this a bit:

$ ack 's*' test.txt
1 file found?
2 files found?
24 files found?
No files found.

None of the results lines are colored. We would expect that on the first line where there is no s to match. The others are more puzzling.

jakebman commented 1 year ago

Is it possible the zero-width first match is triggering last if $match_length <= 0;

$ ack 's*' test.txt /dev/null --column
test.txt
1:1:1 file found?
2:1:2 files found?
3:1:24 files found?
4:1:No files found.