optimized syntax '+' cause 'random_recursive_mutation' error

0x7Fancy commented 10 months ago

          going further, I found a way to mitigate;

based on the above issues, we create simpler test cases, test.json:

{
    "<entry>": [["I ", "<stmt1>", "like C++\n"]],
    "<stmt1>": [["<NODE>", "<stmt1>"], []],
    "<NODE>": [["very "]]
}

tanslate to test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | NODE stmt1
     ;
NODE : 'very '
     ;

and input 40960_very.txt:

I very very ...(*40956)... very very like C++

running with antlr4-parse: Screen Shot 2024-01-08 at 17 56 39

from the perspective of antlr4, we can use the + syntax to describe test.g4, and ignore this prefix matching, as follows test.g4:

grammar test;
entry: 'I ' stmt1 'like C++\n' EOF
     ;
stmt1: 
     | (NODE)+
     ;
NODE : 'very '
     ;

running again with antlr4-parse: Screen Shot 2024-01-08 at 17 59 40

so I made a patch to implement the above ideas, please refer to https://github.com/0x7Fancy/Grammar-Mutator/commit/6eae7d14f579b4d3d1196fc1a288c61044a1afe1;

I have only implemented the optimization of head recursion and tail recursion here, which is simple and easy to understand. for intermediate recursion, I think it can be rewritten as head/tail recursion in json

of course, this is just a mitigation measure. When the mutation generates a sufficiently complex syntax tree, it may still cause antlr4 to get stuck in syntax parsing.

Originally posted by @0x7Fancy in https://github.com/AFLplusplus/Grammar-Mutator/issues/17#issuecomment-1880701080

0x7Fancy commented 10 months ago

sorry, the commit https://github.com/AFLplusplus/Grammar-Mutator/commit/6eae7d14f579b4d3d1196fc1a288c61044a1afe1, this is a wrong patch, '+' syntax will cause 'random_recursive_mutation' to be invalid, please roll back to https://github.com/AFLplusplus/Grammar-Mutator/commit/ff4e5a265daf5d88c4a636fb6a2c22b1d733db09;

for antlr4, the '+' syntax uses breadth expansion, but 'random_recursive_mutation' relies on depth expansion (tree structure) operation; in actual operation, 'random_recursive_mutation' always returns that the recursive node cannot be found, so each round random_recursive_mutation will produce the same content as the original data.

in #17 issue, I encountered a problem. An error occurred when minimizing the poc, which caused the problem. finally, I submitted the patch by mistake.

Back to the original question #17, I gave the reason in this reply https://github.com/AFLplusplus/Grammar-Mutator/issues/17#issuecomment-1893484660, there are several places in the grammar that break the rules of LL(1)/LL(*), (non-terminal symbols can be deduced from ε), which makes antlr4 very easy to enter a backtracking loop.

for my original question, my test case is:

grammar test_multi;
entry: stmt1 '\n' EOF
     ;
stmt1: 'I ' stmt2 'like C++'
     ;
stmt2: NODE
     | NODE NODE
     | NODE NODE NODE
     | NODE NODE NODE NODE
     | NODE NODE NODE NODE NODE
     | NODE stmt2
     ;
NODE : 'very '
     ;

similarly, I broke the LL(1)/LL(*) rules (the candidate values of non-terminal symbols (first(α)) intersect), which also caused antlr4 to fall into grammar parsing backtracking.

I fixed the syntax file as follows:

grammar test_multi_patch;
entry: stmt1 '\n' EOF
     ;
stmt1: 'I ' stmt2 'like C++'
     ;
stmt2: 
     | NODE stmt2
     ;
NODE : 'very '
     ;

testing it: Screen Shot 2024-01-18 at 12 38 02

vanhauser-thc commented 10 months ago

so is a fix to the grammar mutator needed?

0x7Fancy commented 10 months ago

nope, the commit https://github.com/AFLplusplus/Grammar-Mutator/commit/ff4e5a265daf5d88c4a636fb6a2c22b1d733db09 is perfect, Grammar-Mutator will works fine.

as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment

h1994st commented 10 months ago

nope, the commit ff4e5a2 is perfect, Grammar-Mutator will works fine.

as users, what we need to do is to write the correct json syntax as correctly as possible (perhaps according to LL(1) rules), checking and testing more, and then fuzzing in the production environment

@0x7Fancy do you mean we need to revert you PR that was merged recently?

0x7Fancy commented 10 months ago

yes, revert that PR. (or let me overwrite it later

AFLplusplus / Grammar-Mutator

optimized syntax '+' cause 'random_recursive_mutation' error #42