Closed trust1995 closed 3 years ago
Hi @trust1995
Thanks for reporting the issue.
The problem is due to your grammar, as it uses left recursion. It seems ANTLR4 cannot assign the alternative number correctly while using left recursion. It gives the recursive rule of A
an invalid alternative number, 0
, which results in rule_id = MAXUINT
in antlr4_shim
. I currently don't know why this happens. You can fire an issue to ANTLR4 community or check its source codes :)
If you convert the grammar to right recursion, antlr4_shim
works well:
entry: A
A: B | B A
B: "MYTOKEN"
Thanks a lot for the reply, things like that really should be written somewhere :)
@trust1995
Yeah, I am updating the README in the grammar directory to remind future users. Thanks!
During the input parsing shim, nodes are created using
node = node_create_with_rule_id(non_terminal_node->getRuleIndex(), non_terminal_node->getAltNumber() - 1);
However, in my tests the antlr4::ParserRuleContext node's getAltNumber() returns 0 on OUTER recursive grammar nodes. Therefore all nodes up to the inner one will have invalid rule_id.For example, for this G4 grammar:
A: B | A B B: "MYTOKEN" entry: A
The input "MYTOKEN MYTOKEN MYTOKEN" will be parsed as entry -> A -> A -> A -> B ++++++|+++| -> B ++++++| -> B
The last A will have rule_id = 0, the previous ones have rule_id = MAXUINT. While incidentally specifically here this will not screw up the fuzzer behavior, when there are various recursive expansions it is a major issue.