Open renatahodovan opened 5 months ago
The patch fixes a 4 years old issue reported by ClusterFuzz. This issue is possibly responsible (among others) for the poor fuzzing results.
Minimal qjs repro of the infinite loop that tries to match the empty string non-greedily:
var r = new RegExp("()*?a");
r.exec(",");
Execute: ./qjs test.js
I think it is better to fix the non greedy matching first. This patch only hides the real problem.
The commit 36911f0d
should fix the issue. Do you have other cases where the backtracking limitation workaround is necessary ?
@bellard Thanks for the fix, it really fixed the recursion in the previous test case! However, I have another repro:
var r = new RegExp("(.+)+a");
r.exec("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV");
This example is not about infinite recursion. It is just demonstrate valid regexp behavior for given pattern, for which regexp execution time growing very fast with growing length of input string
You are right (I think), but I believe we also agree that such inputs are either useless or malicious. Therefore, if we limit the number of recursive calls they trigger, we won't be restricting the actual capabilities of the tool, but we can prevent malicious loops. (I killed the above code after 6 minutes, and who knows how long it would have run. And the input isn't even that long...)
Oh, it seems that the comment I replied to has disappeared in the meantime.
I see no problem with your example because it really requires a large number of backtracking (try it in another JS engine). It is the same problem as having a JS function which takes a long time to execute.
However, there is a real problem that should be addressed: the interrupt handler registered with JS_SetInterruptHandler() is not called during regexp execution which prevents the user from easily interrupting their execution.
@bellard Is there any update on the correct method for registering an interrupt handler during regex execution? Alternatively, could the above patch serve as a temporary solution to address the frequent timeout issues during fuzzing, which are causing early exits and poor performance?
@bellard: I think I found a solution: we can create a fuzzer-friendly build, as suggested by the libFuzzer documentation, using the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION
macro. This way, the backtracking limit will only be applied for fuzzing and will not affect the normal mode operation of the JS engine, while also preventing the fuzz session from continuously timing out.
Gentle ping
@saghul @bellard what do you think of the new fuzzer-specific approach?
The change LGTM (perhaps the name could use some bikesheding) but I have no commit rights here :-)
@saghul Thanks for the LGTM! The question about the name of the environment variable is an "easy" one, since it is a constant set by OSS-Fuzz and it is also the officially recommended name by LLVM for creating fuzzer friendly build from the target. :-)
The regex engine was prone to excessive backtracking, leading to timeouts and infinite loops, particularly with patterns involving nested quantifiers.
This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded, the regex engine aborts to prevent excessive backtracking.