Another recursion inconsistency corner case

addisoncrump commented 11 months ago

  re> /a(?0)z||(?0)++/endanchored
data> abcd
 0: 
data> abcd\=no_jit
Failed: error -52: nested recursion at the same subject position

auto-callout annotated behaviour

``` re> /a(?0)z||(?0)++/endanchored,auto_callout data> abcd --->abcd +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +8 ^ (?0)++ +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | 0: data> abcd\=no_jit --->abcd +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +8 ^ (?0)++ +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ Failed: error -52: nested recursion at the same subject position ```

Unlike many of the examples in #334, however, this one appears to have inconsistent behaviour for the (?1) equivalent:

  re> /(a(?1)z||(?1)++)$/
data> abcd
 0: 
 1: 
data> abcd\=no_jit
Failed: error -52: nested recursion at the same subject position

Perl agrees with interpreter that it is "infinite recursion":

$ perl -Mre -e 'print "match\n" if shift =~ /(a(?1)z||(?1)++)$/' abc
Infinite recursion in regex at -e line 1.

I can make an exception for this error code in the fuzzer's check for equivalence, but it would make us unable to detect future inconsistent recursion issues.

zherczeg commented 11 months ago

diff --git a/src/pcre2_match.c b/src/pcre2_match.c
index b2e1f23..c551a92 100644
--- a/src/pcre2_match.c
+++ b/src/pcre2_match.c
@@ -5441,8 +5441,8 @@ fprintf(stderr, "++ %2ld op=%3d %s\n", Fecode - mb->start_code, *Fecode,
         P = (heapframe *)((char *)N - frame_size);
         if (N->group_frame_type == (GF_RECURSE | number))
           {
-          if (Feptr == P->eptr && mb->last_used_ptr == P->recurse_last_used)
-            return PCRE2_ERROR_RECURSELOOP;
+//          if (Feptr == P->eptr && mb->last_used_ptr == P->recurse_last_used)
+//            return PCRE2_ERROR_RECURSELOOP;
           break;
           }
         offset = P->last_group_offset;

You get the same result. Internally you could try to rerun these patterns with disabling those two lines, and if they are the same, this is just the interpreter extra safety line.

zherczeg commented 11 months ago

For a real infinite pattern (e.g. /(?0)b/) you get PCRE2_ERROR_MATCHLIMIT / PCRE2_ERROR_JIT_STACKLIMIT when those two lines are disabled (although it is a bit slow for the interpreter).

zherczeg commented 11 months ago

A suggestion: instead of adding an option, you could use an environment variable (see getenv) to enable / disable that two lines in the fuzzer. This way you need to apply a minor patch to the codebase, and in these cases, you can rerun the pattern with setting that environment variable.