Open 0x7Fancy opened 10 months ago
sorry, the commit https://github.com/AFLplusplus/Grammar-Mutator/commit/6eae7d14f579b4d3d1196fc1a288c61044a1afe1, this is a wrong patch, '+' syntax will cause 'random_recursive_mutation' to be invalid, please roll back to https://github.com/AFLplusplus/Grammar-Mutator/commit/ff4e5a265daf5d88c4a636fb6a2c22b1d733db09;
for antlr4, the '+' syntax uses breadth expansion, but 'random_recursive_mutation' relies on depth expansion (tree structure) operation; in actual operation, 'random_recursive_mutation' always returns that the recursive node cannot be found, so each round random_recursive_mutation
will produce the same content as the original data.
in #17 issue, I encountered a problem. An error occurred when minimizing the poc, which caused the problem. finally, I submitted the patch by mistake.
Back to the original question #17, I gave the reason in this reply https://github.com/AFLplusplus/Grammar-Mutator/issues/17#issuecomment-1893484660, there are several places in the grammar that break the rules of LL(1)/LL(*), (non-terminal symbols can be deduced from ε), which makes antlr4 very easy to enter a backtracking loop.
for my original question, my test case is:
grammar test_multi;
entry: stmt1 '\n' EOF
;
stmt1: 'I ' stmt2 'like C++'
;
stmt2: NODE
| NODE NODE
| NODE NODE NODE
| NODE NODE NODE NODE
| NODE NODE NODE NODE NODE
| NODE stmt2
;
NODE : 'very '
;
similarly, I broke the LL(1)/LL(*) rules (the candidate values of non-terminal symbols (first(α)) intersect), which also caused antlr4 to fall into grammar parsing backtracking.
I fixed the syntax file as follows:
grammar test_multi_patch;
entry: stmt1 '\n' EOF
;
stmt1: 'I ' stmt2 'like C++'
;
stmt2:
| NODE stmt2
;
NODE : 'very '
;
testing it:
so is a fix to the grammar mutator needed?
nope, the commit https://github.com/AFLplusplus/Grammar-Mutator/commit/ff4e5a265daf5d88c4a636fb6a2c22b1d733db09 is perfect, Grammar-Mutator will works fine.
as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment
nope, the commit ff4e5a2 is perfect, Grammar-Mutator will works fine.
as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment
@0x7Fancy do you mean we need to revert you PR that was merged recently?
yes, revert that PR. (or let me overwrite it later
based on the above issues, we create simpler test cases,
test.json
:tanslate to
test.g4
:and input
40960_very.txt
:running with
antlr4-parse
:from the perspective of antlr4, we can use the
+
syntax to describetest.g4
, and ignore this prefix matching, as followstest.g4
:running again with
antlr4-parse
:so I made a patch to implement the above ideas, please refer to https://github.com/0x7Fancy/Grammar-Mutator/commit/6eae7d14f579b4d3d1196fc1a288c61044a1afe1;
I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json
of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.
Originally posted by @0x7Fancy in https://github.com/AFLplusplus/Grammar-Mutator/issues/17#issuecomment-1880701080