PCRE2Project / pcre2

PCRE2 development is now based here.
Other
919 stars 191 forks source link

Inconsistency with ACCEPT #353

Closed addisoncrump closed 11 months ago

addisoncrump commented 12 months ago

ACCEPT appears to behave inconsistently:

PCRE2 version 10.43-DEV 2023-04-14 (8-bit)
  re> /aa(*ACCEPT)aa/endanchored,auto_callout
data> aaa
--->aaa
 +0 ^       a
 +1 ^^      a
 +2 ^ ^     (*ACCEPT)
 +0   ^     a
 +1   ^^    a
No match
data> aaa\=no_jit
--->aaa
 +0 ^       a
 +1 ^^      a
 +2 ^ ^     (*ACCEPT)
 +0  ^      a
 +1  ^^     a
 +2  ^ ^    (*ACCEPT)
 0: aa

It seems that the JIT does not walk back after the ACCEPT.

zherczeg commented 11 months ago

This is another contradicting issue. What happens if the unstoppable cannonball meets the unbreakable wall? 1) accept > endanchored. Probably easier in both jit and interpreter. 2) endanchored > accept. Turn (*ACCEPT) to \Z(*ACCEPT).

Both can be implemented, but I would go the simpler. This is not a case which worth the effort.

PhilipHazel commented 11 months ago

The interpreter treats this as follows: it first creates a match because of ACCEPT, but then the endanchored test fails, so it becomes a no match. This causes a bumpalong. Starting at the second "a", the match succeeds. There is something very odd in the JIT because it gives a different result for "aaaa".

/aa(*ACCEPT)aa/endanchored,jitverify aaa No match (JIT) aaa\=no_jit 0: aa aaaa 0: aa (JIT) aaaa\=no_jit 0: aa

zherczeg commented 11 months ago

This is the second option. JIT simply does not handle this case at all, it backtracks to accept, which results in an undefined state. Since this case does not exist in Perl, we should define what to do. Accept is stonger or weaker than endanchored. And I prefer not overdoing it, it is a contradicting case anyway.

PhilipHazel commented 11 months ago

PCRE2_ANCHORED cannot be overridden by anything within the pattern. I think PCRE2_ENDANCHORED should be treated the same way, that is, I think ACCEPT should be weaker than endanchored.

zherczeg commented 11 months ago

Fixed in 1c09efe6b0008a3b463299efe7501bc3140806f3 Turned out there was some code there, just not working properly.