Open 0x7Fancy opened 10 months ago
If you have time to work on this, that would be great! Feel free to submit a PR and add your test cases.
I can take a look at this later when I am available. Probably not in recent weeks :(
okay, I'll try my best to provide
I tried to solve the problem but I found that Grammar-Mutator has to rely on AST to work, if I add lexical rules, this is not transparent to Grammar-Mutator.
In the above example, we focus on the input data of 10(12)
, respectively using Grammar.g4
and Grammar_patch.g4
, we can see that Grammar.g4
only with grammar parser has a complete AST structure (in line with Grammar-Mutator expectations)
I think this is a trade-off in the Grammar-Mutator design. It loses part of the mutation data, but the program design is more concise, clear and direct.
so, currently Grammar-Mutator is perfect
In my experimental environment, I found json to g4 only with "parser" cause some syntax error, syntax parsing errors may lead to the possibility of losing a large amount of mutated data.
I made mincase
lex.json
:Grammar-Mutator
make
it, generateGrammar.g4
is:we prepared input data
seed1 / seed2
, and useantlr4-parse
to testing:why is
10(10)
parsed incorrectly? because antlr4 is divided into two stages: lexer and parser. during lexer stage,node_NUMBER:10
will be recognized as TOKEN, and in the parser stage, the result isnode_NUMBER (node_NUMBER)
, so an error occurred.in the antlr4 grammar, lex rules begin with an uppercase letter, parser rules begin with a lowercase letter, so we should tell antlr4 the lexical rules clearly, patch
Grammar_patch.g4
:testing again:
maybe we can optimize the json to g4 generation code, to distinguish between lexer and parser?