Closed xrat closed 1 month ago
Will take a look at this. It works fine with the $
anchor, but ^
is handled differently internally.
Just a quick follow-up note: I'm just theorizing here, but it appears that grep just outputs all lines with grep -E '(^|x)'
, so the ^
is just like any other empty pattern that matches all input lines. It doesn't do anything special, because grep -E -o '(^|x)'
only outputs matches of x
, nothing else, which means that it's internal machinery isn't using ^
at all in this case. Perhaps this some GNU/BSD grep peculiarity? Will check it out.
Please note that I was just providing a minimal example. My use case is (^|x)y
.
I am also affected by this. IIRC, I was trying to match '(^|\s)x', which GNU grep handles without issue.
There may be more efficient ways to architect that expression, but I guess if ugrep is intended to be a drop-in replacement for GNU grep, it seems like it should support these types of expressions.
This limitation of the ^
anchor is no longer present in the upcoming ugrep update:
$ ugrep -c '(^|\s)y' enwik8 --stats
23930
Searched 1 file in 0.087 seconds: 1 matching (100%)
GNU grep is 10x slower in this case:
$ /usr/bin/time ggrep -c -E '(^|\s)y' enwik8
23930
0.84 real 0.82 user 0.01 sys
See also #426
I tried to replace my GNU grep with ugrep and found that the pattern
(^|x)
which I happen to use at times causes error "empty expression":