Closed alvin55531 closed 1 week ago
Please use the latest release of ripgrep.
Please minimize the rg
command.
Please minimize the regex.
(This looks more like a PCRE2 regex question than a ripgrep question. You might have better luck asking in a regex help forum.)
I have made changes which I discussed further down the comment, but the following paragraph is the central part of the issue:
The regex works as expected, what ripgrep outputs isn't necessarily wrong per say, I'm just wondering why the results are inconsistent with the way ripgrep treats empty lines based on changes in the input file. Why does it seem like the --only-matching
flag gets dropped when I change the input file? Is it a memory problem or the way ripgrep reads files?
I updated to ripgrep 14.1.1. I simplified the rg command to the following:
rg --pcre2 --multiline --multiline-dotall --only-matching '(?=.*Testword4)(?=.*Testword5)[\s]*?[\S][^\n]*?$'
It is seeing whether Testword4
and Testword5
both exists anywhere in the string, and then matching up to the first non-empty line.
I also simplified the test file:
Testword5
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
This is a test sentence.
Testword4
Actual results:
test.md
1:
2:
3:Testword5
Expected:
test.md
3:Testword5
If I change the test file like this:
Testword5
This is a test sentence.
This is a test sentence.
This is a test sentence.
Testword4
Now it outputs the expected results:
test.md
3:Testword5
This is probably a bug due to
and
I don't think this will be fixed any time soon. I believe there are other extant issues already open with the same underlying bug. So I'm going to call this a duplicate of #2528.
Thanks for the minimization! It will make for a nice test case for whenever this bug gets fixed.
Thank you for the explanation!
I am not familiar with Rust nor search engine implementations, so I have a few more questions.
MAX_LOOK_AHEAD
mean ripgrep will give up look-ahead matching if the current character is 128 bytes away from the beginning of the match? rg --pcre2 --multiline --multiline-dotall --only-matching '(?=.*Testword4)(?=.*Testword5).*?Testword4'
Output (expected behavior where empty lines are not printed, following the --only-matching
flag)
test.md
3:Testword5
4:This is a test sentence.
5:This is a test sentence.
6:This is a test sentence.
7:This is a test sentence.
8:This is a test sentence.
9:This is a test sentence.
10:This is a test sentence.
11:This is a test sentence.
12:This is a test sentence.
13:This is a test sentence.
14:This is a test sentence.
15:This is a test sentence.
16:This is a test sentence.
17:This is a test sentence.
18:Testword4
Please tick this box to confirm you have reviewed the above.
What version of ripgrep are you using?
ripgrep 13.0.0
How did you install ripgrep?
apt repository
What operating system are you using ripgrep on?
Windows 10 with WSL 2.3.24.0 Android with Termux
Describe your bug.
I have a regex which searches whether a certain set of keywords exist within a single block (delimited by
END
)When I put the regex and file contents into (with PCRE2 engine), it would match from the very end of line 1 to the end of line 3. With
--only-matching
, there should be zero characters matched for line 1, thus it should output nothing. Line 2 only matches line boundaries, so it should also output nothing. The only line that should be outputted should be line 3 (unless I misunderstood ripgrep's behavior).When I use
--count --include-zero
, it would show no matches (test.md:0
), which I believe means ripgrep is not matching anything at all, yet it is still outputting lines. But it's not outputting every line in the file, it's outputting the precise lines that are involved in the match (if ripgrep was in "always printed whole lines" mode).I have using the following flags (none of these had any effect):
--mmap
and--no-mmap
--line-buffered
andblock-buffered
(?ims)
and not using it--regex-size-limit 1G
-uuu
What are the steps to reproduce the behavior?
I have a test markdown file (called
test.md
) with the following contents:This is the search command:
What is the actual behavior?
What is the expected behavior?
The thing is, if I make one of the following adjustments as shown below, now it suddenly works as expected (I have no clue why).
Shortening contents in the block being matched
Group the keywords together