PCRE2Project / pcre2

PCRE2 development is now based here.
Other
922 stars 194 forks source link

Inconsistent (?0) behaviour in the presence of endanchored #331

Closed addisoncrump closed 1 year ago

addisoncrump commented 1 year ago

Discovered by #322.

I tried to minify and root cause this further, but I got a bit stuck as I've never used recursive patterns. Consider the following output:

$ ./pcre2test -jit
PCRE2 version 10.43-DEV 2023-04-14 (8-bit)
  re> /|a(?0)/endanchored
data> aaaa
 0: aaaa
data> aaaa\=no_jit
 0: a

It seems that JIT and non-JIT do not agree on what is matched. I suspect that they are taking different branches in the recursive pattern; thie JIT is greedily taking the right branch, whereas the non-JIT is only taking the right branch at the last character. I suspect that the non-JIT treats the pattern's end anchor as part of the pattern (and thus the only final character is valid) whereas the JIT does not and reads the whole input before hitting the end. This notably does not appear when using a $ at the end as opposed to the endanchored flag.

PhilipHazel commented 1 year ago

I think JIT is right here so I am investigating the interpreter. Unfortunately, we can't compare with Perl because it doesn't have an "endanchored" option, and as you say, there is no difference if $ is used. I also tried /(\z|a(?1)/ but all three give the same result.