Closed atarom closed 1 month ago
Found the problem in the buffer window shifting in v7, after checking your example (thanks for sharing!). It's not the regex engine, but the alignment of the shifted window is off in the buffer internally. This was changed to accommodate hex line contexts, including after buffer shifting. Will release an update soon.
test2.txt.zip Hi, thank you very much for solving the previous bug so quickly. Today, with new data, I got this similar error with the current version 7.01
$ ugrep -v '^b5m:urlOrto=' test2.txt | ugrep '^drid'
drid.es/vgn-ext-templating/v/index.jsp?vgnextchannel=9e4c43db40317010VgnVCM100000dc0ca8c0RCRD&vgnextoid=924b7b05a8350410VgnVCM2000000c205a0aRCRD
$ ugrep --version
ugrep 7.0.1 x86_64-pc-linux-gnu +avx2; -P:pcre2jit; -z:zlib,bzip2,lzma,lz4,zstd,7z,tar/pax/cpio/zip
License: BSD-3-Clause; ugrep user manual: <https://ugrep.com>
Written by Robert van Engelen and others: <https://github.com/Genivia/ugrep>
Ugrep utilizes the RE/flex regex library: <https://github.com/Genivia/RE-flex>
and I don't have any line that start with "drid"
GNU grep hits ^drid
too:
$ ugrep -v '^b5m:urlOrto=' test2.txt | ggrep '^drid'
drid.es/vgn-ext-templating/v/index.jsp?vgnextchannel=9e4c43db40317010VgnVCM100000dc0ca8c0RCRD&vgnextoid=924b7b05a8350410VgnVCM2000000c205a0aRCRD
Also with PCRE2 matching the same happens:
$ ugrep -vP '^b5m:urlOrto=' test2.txt | ggrep -E '^drid'
drid.es/vgn-ext-templating/v/index.jsp?vgnextchannel=9e4c43db40317010VgnVCM100000dc0ca8c0RCRD&vgnextoid=924b7b05a8350410VgnVCM2000000c205a0aRCRD
but it does not return a match with ugrep 6.5, so something seems still off with the -v
and ^
combination in the first regex with v7. I was sure was fixed after more testing on our end to verify the updated logic on the position of the begin-of-line assertion. Apparently not.
Looks like a problem with the buffer-shift-event-handler logic, which depends on the buffer's shifted location at the begin-of-line. Because we keep a bit more data in the buffer to accommodate context hexlines, the handler may not always get a fix on the correct line to use, This affects option -v
and likely also options -ABC
when files are large.
The solution would be to revert back to the old logic, except when we have hex context lines because we don't use -v
and -ABC
in that case anyway.
I've tested the change over back to the old logic and added the hex context lines update mentioned above. Both work like a charm. To make sure, I tested with your examples but also with different internal buffer sizes and diff'ed the results of them to spot any deviations that can be attributed to problems with the buffer shift logic and associated event handlers or the hex context lines logic for --hexdump
.
Will release an update very soon.
test.txt Hi, thanks for this excellent tool, I have been using it for some time without problems, but today when updating to version 7 I encountered this problem with some files like the attached one
In previous versions I haven't had that problem, I have tried egrep and it works correctly with the same file
I only have one machine and my knowledge is very limited, can someone tell me if with version 7 and with the attached file, get the same wrong result as me?
Cheers