Closed addisoncrump closed 1 year ago
Regardless of this difference, setting both bit seems pointless, I would consider it an error.
I agree. The documentation states explicitly that firstline applies to unanchored patterns, so either it should be diagnosed as an error, or it should be ignored for anchored patterns. I favour ignoring it because patterns can become auto-anchored even without having the anchored flag set.
So, what should change? Should I enforce that anchored and firstline are an invalid combination in the fuzzer, or does the behaviour of the library need to change?
The interpreter needs to change. I have not yet looked at the code but I suspect it might just be an off-by-one bug. Then we can update the documentation to point out that firstline is pointless for an anchored pattern. But I won't get to this till tomorrow or possibly later today as I have to go out now.
Investigation of pcre2_match() has discovered a sort of off-by-one bug, but somewhat more complicated because it's in the start-of-match optimizations (if you turn off optimization there's no bug). I believe I now know how to fix this (also in dfa_match) but I'm going to sleep on it and see if I still think the same tomorrow. :-)
My instinct not to rush ahead proved right. What I was doing yesterday was nonsense. The correct fix is much simpler and I have now committed it (52cc4ff).
Found with #322.
The regex
\n
does not match the same strings when used with and without JIT when the anchored and firstline options are enabled.Documentation suggests that the JIT implementation is correct here (emphasis mine):
Not sure where to add the testcase here. This behaviour only appears when both anchored and firstline are set.