bellard / quickjs

Public repository of the QuickJS Javascript Engine.
https://bellard.org/quickjs
Other
8.51k stars 892 forks source link

Fix excessive backtracking in regex engine by introducing backtrackig limit #311

Open renatahodovan opened 5 months ago

renatahodovan commented 5 months ago

The regex engine was prone to excessive backtracking, leading to timeouts and infinite loops, particularly with patterns involving nested quantifiers.

This commit introduces a backtracking counter and a limit of 1000 backtracking steps. When this limit is exceeded, the regex engine aborts to prevent excessive backtracking.

renatahodovan commented 5 months ago

The patch fixes a 4 years old issue reported by ClusterFuzz. This issue is possibly responsible (among others) for the poor fuzzing results.

renatahodovan commented 5 months ago

Minimal qjs repro of the infinite loop that tries to match the empty string non-greedily:

var r = new RegExp("()*?a");
r.exec(",");

Execute: ./qjs test.js

bellard commented 5 months ago

I think it is better to fix the non greedy matching first. This patch only hides the real problem.

bellard commented 5 months ago

The commit 36911f0d should fix the issue. Do you have other cases where the backtracking limitation workaround is necessary ?

renatahodovan commented 5 months ago

@bellard Thanks for the fix, it really fixed the recursion in the previous test case! However, I have another repro:

var r = new RegExp("(.+)+a");
r.exec("VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV");
renatahodovan commented 5 months ago

This example is not about infinite recursion. It is just demonstrate valid regexp behavior for given pattern, for which regexp execution time growing very fast with growing length of input string

You are right (I think), but I believe we also agree that such inputs are either useless or malicious. Therefore, if we limit the number of recursive calls they trigger, we won't be restricting the actual capabilities of the tool, but we can prevent malicious loops. (I killed the above code after 6 minutes, and who knows how long it would have run. And the input isn't even that long...)

renatahodovan commented 5 months ago

Oh, it seems that the comment I replied to has disappeared in the meantime.

bellard commented 5 months ago

I see no problem with your example because it really requires a large number of backtracking (try it in another JS engine). It is the same problem as having a JS function which takes a long time to execute.

However, there is a real problem that should be addressed: the interrupt handler registered with JS_SetInterruptHandler() is not called during regexp execution which prevents the user from easily interrupting their execution.

renatahodovan commented 5 months ago

@bellard Is there any update on the correct method for registering an interrupt handler during regex execution? Alternatively, could the above patch serve as a temporary solution to address the frequent timeout issues during fuzzing, which are causing early exits and poor performance?

renatahodovan commented 5 months ago

@bellard: I think I found a solution: we can create a fuzzer-friendly build, as suggested by the libFuzzer documentation, using the FUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION macro. This way, the backtracking limit will only be applied for fuzzing and will not affect the normal mode operation of the JS engine, while also preventing the fuzz session from continuously timing out.

renatahodovan commented 4 months ago

Gentle ping

renatahodovan commented 3 months ago

@saghul @bellard what do you think of the new fuzzer-specific approach?

saghul commented 3 months ago

The change LGTM (perhaps the name could use some bikesheding) but I have no commit rights here :-)

renatahodovan commented 3 months ago

@saghul Thanks for the LGTM! The question about the name of the environment variable is an "easy" one, since it is a constant set by OSS-Fuzz and it is also the officially recommended name by LLVM for creating fuzzer friendly build from the target. :-)