PCRE2Project / pcre2

PCRE2 development is now based here.
Other
921 stars 194 forks source link

Inconsistency with variable lookbehinds including `\z` #358

Open addisoncrump opened 11 months ago

addisoncrump commented 11 months ago

This will almost certainly never appear in a real regex, but does appear in fuzzer results. Resolving this will allow more problematic test cases to be uncovered.

  re> /(?<=a?b\z)/auto_callout
data> abc
--->abc
 +0 ^       (?<=
 +0  ^      (?<=
 +4  ^      a?
 +6  ^      b
 +0   ^     (?<=
 +4   ^     a?
 +6   ^     b
 +7   ^     \z
 +9   ^     )
+10   ^     End of pattern
 0: 
data> abc\=no_jit
--->abc
 +0 ^       (?<=
 +0  ^      (?<=
 +4  ^      a?
 +6  ^      b
 +7  ^^     \z
 +0   ^     (?<=
 +4   ^     a?
 +6   ^     b
 +7   ^     \z
 +4   ^     a?
 +6   ^     b
 +7   ^     \z
 +0    ^    (?<=
 +4    ^    a?
 +6    ^    b
 +7    ^    \z
 +4    ^    a?
 +6    ^    b
No match
PhilipHazel commented 11 months ago

This does look like a JIT issue (ignoring auto-callout, which is a red herring). Perl behaves the same as the interpreter, that is, it gives no match.

addisoncrump commented 11 months ago

Additional, seemingly related cases:

  re> /(?<=a?$)/anchored
data> (?<=a?$)
 0: 
data> (?<=a?$)\=no_jit
No match
  re> /(?<*a?$)/anchored
data> (?<*a?$)
 0: 
data> (?<*a?$)\=no_jit
No match

Doesn't appear with anchored unset.

carenas commented 5 months ago

Doesn't appear with anchored unset.

without anchored the interpreter correctly matches the empty string at the end of the subject but JIT still shows a bug (using a slightly modified pcre2test to make the matched offset visible) as shown by:

  re> /(?<=a?$)/jit
data> (?<=a?$)
 0: @0
data> (?<=a?$)\=no_jit
 0: @8
data> 

as mentioned before the results from the interpreter and Perl are on alignment.