PCRE2Project / pcre2

PCRE2 development is now based here.
Other
919 stars 191 forks source link

Another recursion inconsistency corner case #367

Closed addisoncrump closed 9 months ago

addisoncrump commented 11 months ago
  re> /a(?0)z||(?0)++/endanchored
data> abcd
 0: 
data> abcd\=no_jit
Failed: error -52: nested recursion at the same subject position
auto-callout annotated behaviour ``` re> /a(?0)z||(?0)++/endanchored,auto_callout data> abcd --->abcd +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +8 ^ (?0)++ +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | +8 ^ (?0)++ +0 ^ a +7 ^ | +14 ^ End of pattern +0 ^ a +7 ^ | 0: data> abcd\=no_jit --->abcd +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ +0 ^^ a +7 ^^ | +14 ^^ End of pattern +5 ^^ z +7 ^ | +8 ^ (?0)++ +0 ^ a +1 ^^ (?0) +0 ^^ a +7 ^^ | +5 ^^ z +8 ^^ (?0)++ Failed: error -52: nested recursion at the same subject position ```

Unlike many of the examples in #334, however, this one appears to have inconsistent behaviour for the (?1) equivalent:

  re> /(a(?1)z||(?1)++)$/
data> abcd
 0: 
 1: 
data> abcd\=no_jit
Failed: error -52: nested recursion at the same subject position

Perl agrees with interpreter that it is "infinite recursion":

$ perl -Mre -e 'print "match\n" if shift =~ /(a(?1)z||(?1)++)$/' abc
Infinite recursion in regex at -e line 1.

I can make an exception for this error code in the fuzzer's check for equivalence, but it would make us unable to detect future inconsistent recursion issues.

zherczeg commented 11 months ago
diff --git a/src/pcre2_match.c b/src/pcre2_match.c
index b2e1f23..c551a92 100644
--- a/src/pcre2_match.c
+++ b/src/pcre2_match.c
@@ -5441,8 +5441,8 @@ fprintf(stderr, "++ %2ld op=%3d %s\n", Fecode - mb->start_code, *Fecode,
         P = (heapframe *)((char *)N - frame_size);
         if (N->group_frame_type == (GF_RECURSE | number))
           {
-          if (Feptr == P->eptr && mb->last_used_ptr == P->recurse_last_used)
-            return PCRE2_ERROR_RECURSELOOP;
+//          if (Feptr == P->eptr && mb->last_used_ptr == P->recurse_last_used)
+//            return PCRE2_ERROR_RECURSELOOP;
           break;
           }
         offset = P->last_group_offset;

You get the same result. Internally you could try to rerun these patterns with disabling those two lines, and if they are the same, this is just the interpreter extra safety line.

zherczeg commented 11 months ago

For a real infinite pattern (e.g. /(?0)b/) you get PCRE2_ERROR_MATCHLIMIT / PCRE2_ERROR_JIT_STACKLIMIT when those two lines are disabled (although it is a bit slow for the interpreter).

zherczeg commented 11 months ago

A suggestion: instead of adding an option, you could use an environment variable (see getenv) to enable / disable that two lines in the fuzzer. This way you need to apply a minor patch to the codebase, and in these cases, you can rerun the pattern with setting that environment variable.

PhilipHazel commented 10 months ago

There are probably always going to be inconsistencies for cases where resource-limiting checks kick in.

addisoncrump commented 10 months ago

Would it be appropriate to surround this check with a ifdef for fuzzing mode enabled?

addisoncrump commented 10 months ago

I have implemented this here and can revert: https://github.com/PCRE2Project/pcre2/pull/322/commits/7e34f7e40360e6a84c6227787660a14bcd4a5819

PhilipHazel commented 10 months ago

I'm sort of neutral about this. Without this test, as @zherczeg pointed out, some tests in the interpreter can take a long time to fail. Does the fuzzer reduce the default value for the match limit?

addisoncrump commented 10 months ago

It does, yes.

PhilipHazel commented 10 months ago

That's good. However, after a bit more thought, I have another reservation, which may just be over caution. The --enable-fuzz-support config option causes pcre2fuzzcheck to be built. I am not sure that it should also cause changes to the actual library. My normal working build includes fuzz support so that I can test fuzz reports without a rebuild, but I would not want it to have the code alteration. Hmm. A run-time check is definitely a bad idea. Perhaps we also need --disable-interpreter-recurseloop-check? This should probably also apply to pcre2_dfa_match() which has the same logic as pcre2_match().

zherczeg commented 10 months ago

Whatever we do, infinite loop patterns are not real use cases. They have no practical value. We can add various early checks for them (lot of effort, slow things down), but people will always be able to construct a pattern, which runs forever, and not caught by any checks. So in the end match limit is the only reliable way to stop the matching at some point.

PhilipHazel commented 10 months ago

I agree, but I think it is worth including the simple checks that are in the interpreters because they catch the easy cases rather than using a lot of resources before failing.

addisoncrump commented 10 months ago

Okay, so should this be a separate PR to allow the configure to disable the infinite recursion checks?

PhilipHazel commented 10 months ago

I'm just wondering if we could get away with a run-time check instead. The current test can remain - but if some new runtime option is set, do not give the error. This would not impact on performance because in normal use the error would happen. The option could be PCRE2_DISABLE_RECURSELOOP_CHECK and would apply only to the interpreters. I think this is also less work than modifying the autotools and CMake configs, and all the associated documentation. And you never know, somebody might find a use for it for real. I'm happy to implement this if you'd rather I did it.

addisoncrump commented 10 months ago

I don't mind implementing it, I'm just not sure what would be most appropriate. That said, if setting a configure option, shouldn't an ifdef/ifndef be used anyway?

PhilipHazel commented 10 months ago

I'm suggesting a run-time option, NOT a configure option. That avoids messing with config and saves having to re-compile just to test stuff. It would involve edits to pcre2.h.in, pcre2_match(), pcre2_dfa_match(), and pcre2test.c, all of which should be quite small. Then just a short mention in pcre2api.3, and for completeness, listing the new option in doc/pcre2_match.3 and pcre2_dfa_match.3. I don't think we need an ifdef.

PhilipHazel commented 10 months ago

I have just committed the implementation of PCRE2_DISABLE_RECURSELOOP_CHECK as an option for pcre2_match(). The original test above behaves the same as JIT when this option is set. In the end, I did not add it to pcre2_dfa_match() because that function uses recursive function calls to implement regex recursion and turning off the check gives a stack overflow crash.

PhilipHazel commented 9 months ago

As this particular issue can now be avoided by setting PCRE2_DISABLE_RECURSELOOP_CHECK, I'm going to close this issue.