antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.22k stars 3.71k forks source link

Java20 ambiguities and improvements #4238

Open kaby76 opened 2 months ago

kaby76 commented 2 months ago

This is the first of a series of some ambiguities and improvements that I'm finding with the newest version of trparse with the --ambig option to display ambiguous parses.

Input: in.txt

Ambig trees:

$ trparse in.txt --ambig | trtree -a | grep in.txt.33
CSharp 0 in.txt success 0.8978623
in.txt.33: (start_ (compilationUnit (ordinaryCompilationUnit (topLevelClassOrInterfaceDeclaration (classDeclaration (normalClassDeclaration (CLASS "class") (typeIdentifier (Identifier "S")) (classBody (LBRACE "{") (classBodyDeclaration (classMemberDeclaration (fieldDeclaration (unannType (unannPrimitiveType (numericType (integralType (INT "int"))))) (variableDeclaratorList (variableDeclarator (variableDeclaratorId (Identifier "x")) (ASSIGN "=") (variableInitializer (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (literal (IntegerLiteral "0"))))))))))))))))))))))) (SEMI ";")))) (RBRACE "}"))))) (topLevelClassOrInterfaceDeclaration (classDeclaration (normalClassDeclaration (CLASS "class") (typeIdentifier (Identifier "Test1")) (classBody (LBRACE "{") (classBodyDeclaration (classMemberDeclaration (methodDeclaration (methodModifier (PUBLIC "public")) (methodModifier (STATIC "static")) (methodHeader (result (VOID "void")) (methodDeclarator (Identifier "main") (LPAREN "(") (formalParameterList (formalParameter (unannType (unannReferenceType (unannArrayType (unannClassOrInterfaceType (typeIdentifier (Identifier "String"))) (dims (LBRACK "[") (RBRACK "]"))))) (variableDeclaratorId (Identifier "args")))) (RPAREN ")"))) (methodBody (block (LBRACE "{") (blockStatements (blockStatement (localVariableDeclarationStatement (localVariableDeclaration (localVariableType (unannType (unannReferenceType (unannClassOrInterfaceType (typeIdentifier (Identifier "S")))))) (variableDeclaratorList (variableDeclarator (variableDeclaratorId (Identifier "s")) (ASSIGN "=") (variableInitializer (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (unqualifiedClassInstanceCreationExpression (NEW "new") (classOrInterfaceTypeToInstantiate (Identifier "S")) (LPAREN "(") (RPAREN ")")))))))))))))))))))))))) (SEMI ";"))) (blockStatement (statement (statementWithoutTrailingSubstatement (expressionStatement (statementExpression (methodInvocation (typeName (packageName (Identifier "System") (DOT ".") (packageName (Identifier "out")))) (DOT ".") (Identifier "println") (LPAREN "(") (argumentList (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (literal (StringLiteral "\"s.x=\""))))))))) (ADD "+") (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (expressionName (ambiguousName (Identifier "s")) (DOT ".") (Identifier "x"))))))))))))))))))) (RPAREN ")"))) (SEMI ";")))))) (RBRACE "}")))))) (RBRACE "}"))))))) (EOF ""))
in.txt.33: (start_ (compilationUnit (ordinaryCompilationUnit (topLevelClassOrInterfaceDeclaration (classDeclaration (normalClassDeclaration (CLASS "class") (typeIdentifier (Identifier "S")) (classBody (LBRACE "{") (classBodyDeclaration (classMemberDeclaration (fieldDeclaration (unannType (unannPrimitiveType (numericType (integralType (INT "int"))))) (variableDeclaratorList (variableDeclarator (variableDeclaratorId (Identifier "x")) (ASSIGN "=") (variableInitializer (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (literal (IntegerLiteral "0"))))))))))))))))))))))) (SEMI ";")))) (RBRACE "}"))))) (topLevelClassOrInterfaceDeclaration (classDeclaration (normalClassDeclaration (CLASS "class") (typeIdentifier (Identifier "Test1")) (classBody (LBRACE "{") (classBodyDeclaration (classMemberDeclaration (methodDeclaration (methodModifier (PUBLIC "public")) (methodModifier (STATIC "static")) (methodHeader (result (VOID "void")) (methodDeclarator (Identifier "main") (LPAREN "(") (formalParameterList (formalParameter (unannType (unannReferenceType (unannArrayType (unannClassOrInterfaceType (typeIdentifier (Identifier "String"))) (dims (LBRACK "[") (RBRACK "]"))))) (variableDeclaratorId (Identifier "args")))) (RPAREN ")"))) (methodBody (block (LBRACE "{") (blockStatements (blockStatement (localVariableDeclarationStatement (localVariableDeclaration (localVariableType (unannType (unannReferenceType (unannClassOrInterfaceType (typeIdentifier (Identifier "S")))))) (variableDeclaratorList (variableDeclarator (variableDeclaratorId (Identifier "s")) (ASSIGN "=") (variableInitializer (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (unqualifiedClassInstanceCreationExpression (NEW "new") (classOrInterfaceTypeToInstantiate (Identifier "S")) (LPAREN "(") (RPAREN ")")))))))))))))))))))))))) (SEMI ";"))) (blockStatement (statement (statementWithoutTrailingSubstatement (expressionStatement (statementExpression (methodInvocation (typeName (packageName (Identifier "System")) (DOT ".") (typeIdentifier (Identifier "out"))) (DOT ".") (Identifier "println") (LPAREN "(") (argumentList (expression (assignmentExpression (conditionalExpression (conditionalOrExpression (conditionalAndExpression (inclusiveOrExpression (exclusiveOrExpression (andExpression (equalityExpression (relationalExpression (shiftExpression (additiveExpression (additiveExpression (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (primary (primaryNoNewArray (literal (StringLiteral "\"s.x=\""))))))))) (ADD "+") (multiplicativeExpression (unaryExpression (unaryExpressionNotPlusMinus (postfixExpression (expressionName (ambiguousName (Identifier "s")) (DOT ".") (Identifier "x"))))))))))))))))))) (RPAREN ")"))) (SEMI ";")))))) (RBRACE "}")))))) (RBRACE "}"))))))) (EOF ""))
09/15-10:47:55 ~/issues/g4-current/java/java20/Generated-CSharp
$

typeName/packageName/packageOrTypeName

packageName
    : Identifier ('.' packageName)?
    // left recursion --> right recursion
    ;

typeName
    : packageName ('.' typeIdentifier)?
    ;

packageOrTypeName
    : identifier ('.' packageOrTypeName)?
    // left recursion --> right recursion
    ;

From the JLS20,

 
PackageName:
    Identifier
    PackageName . Identifier

TypeName:
    TypeIdentifier
    PackageOrTypeName . TypeIdentifier

PackageOrTypeName:
    Identifier
    PackageOrTypeName . Identifier

Notes

1) The rule typeName is incorrect. It should have referenced packageOrTypeName, not packageName. 1) Although Antlr does a great job and rewriting left-recursion into kleene operators before running Thompson's Construction, it does not rewrite right recursion. Right recursion should not be used because it's inefficient because it causes a call to a sub-automaton in AdaptivePredict(). E.g., the NFA for packageName is: graphviz (17) 1) For input with a method call System.out.println("s.x=" + s.x);, we have ambiguity on where to include .out. Should it be as a packageName or typeIdentifier?

We don't have a symbol table for the grammar to distinguish packages vs. types. Just get rid of packageName and typeName and just define a dotIdChain: identifier ('.' identifier)*;.