Genivia / ugrep

NEW ugrep 6.5: a more powerful, ultra fast, user-friendly, compatible grep. Includes a TUI, Google-like Boolean search with AND/OR/NOT, fuzzy search, hexdumps, searches (nested) archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
https://ugrep.com
BSD 3-Clause "New" or "Revised" License
2.56k stars 109 forks source link

-A may not show the context line for this pattern #383

Closed firasuke closed 4 months ago

firasuke commented 4 months ago

Hey there,

I've been using ugrep for a while now without issues, but I stumbled upon the following issue when running:

ugrep -rnw . -e ".*SOME_PATTERN.*" -A1

The -A option is not being parsed, or is not working, as ugrep isn't showing any context after the matched pattern, whereas grep is doing that successfully.

Any ideas what could be causing this issue?

genivia-inc commented 4 months ago

I don't think that is a problem caused by any combinations of options, because none of the other options have any effect on the context. If anything differs or matters, it is the pattern itself, which is also modified with -w to match words using anchors.

Since we don't have your test case, you need to help us out here a bit. Can you be a bit more specific? Are after contexts always missing, or in some cases and when?

firasuke commented 4 months ago

Here's the pattern I am using.

With grep: image

With ugrep: image

genivia-inc commented 4 months ago

Looks like there is an "off by one" problem, where the match position counter is off by one and that messes with the context position that no longer aligns. I also see the issue with .*, but not with different patterns even when those match the same initial part of a pattern. To show the differences, note the second search shows the context:

image

With a bit of tracing I found out that the difference between these cases is observable only in parts that actually do not match, like in the after context line. The internal pointer/position is moved by one so the context is no longer the same.

So it flew under the radar since pattern matching is OK, but the after contexts are not when this pattern is used. The "off by one" problem is triggered with the initial used of .* in a pattern and affects the after context (in most cases it appears to affect the after context after the last match, not in between matches, but not 100% sure).

The fix is actually very simple. The "off by one" happens in the pattern matcher, which is in an optimization that scans ahead when nothing possibly matches. The fixed version is correct:

image

Sorry for the trouble this may have caused. Glad you found this and shared it with us! An update will be released soon after a more testing to verify everything.

firasuke commented 4 months ago

Sorry for the trouble this may have caused. Glad you found this and shared it with us! An update will be released soon after a more testing to verify everything.

Yeah, no worries. Thanks for your time and effort on this project.

Keep up the good work!