Open nickion opened 1 year ago
The postfixExpression rule comes almost verbatim from the C Language Specification. I think the Spec committee favors the EBNF argumentExpressionList?
over allowing argumentExpressionList
to derive the empty string in order to avoid issues like kleene operators on the empty string and recursion. I don't have an issue of refactoring rules to derive empty, but it is another step to record when I automate the process scraping this grammar when a new version of the spec comes out. The bigger issue is that the expression rules are not in optimized Antlr syntax. The grammar is slow because of the chained-rule implementation for operator precedence.
I've come across a design issue with the C grammar in rules such as:
| '(' argumentExpressionList? ')'
This is within
postfixExpression
.A C function call will always have a number of arguments, with that number possibly being zero, whereas the grammar describes that there may be a a set of one or more arguments or no set at all. It's a subtle by crucial distinction, and a consequence is that in a scenario such as:
foo()("bar")
within a visitor for postfix expressions there will be only one
argumentExpressionList
available, which would be assumed to be applied to the primary expression Identifierfoo
, rather than to the result of callingfoo
with no arguments.This can be resolved by changing the grammar to:
| '(' argumentExpressionList ')'
which expresses that there is always an argument list, and to change the
argumentExpressionList
towhich describes that an argument expression list may be nothing or one or more expressions. With this revision a postfix expression visitor can correctly determine the number of function calls in a chain from the length of an
argumentExpressionList()
result, and the number of arguments in each function call is the length of the result of callingassignmentExpressionList()
. I did try labelling and parenthesising the arglist, e.g.'(' arglist+=(argumentExpressionList?) ')'
thinking that Antlr4 might then always generate a value even though it may be empty, but this did not work. I've only been using Antlr for a couple of days and there may be an approach for resolving that I've not discovered yet and without requiring a grammar revision, but the technique described above is one I've always used when designing languages and writing Yacc/Bison parsers for them, and so far appears also to work fine for Antlr.