JetBrains / Grammar-Kit

Grammar files support & parser/PSI generation for IntelliJ IDEA
Other
704 stars 126 forks source link

Unintuitive parser code generation #358

Closed marty-friedman closed 4 months ago

marty-friedman commented 4 months ago

Hi, what I'm currently trying to solve is the parser generation logic with grammar-kit for whatever reason replacing rules with their parent parsing method. Here is the grammar:

{
  ...
  extends(".*_expression")=expression
  extends(".*_type")=type
  rightAssociative("(assignment|ternary)_expression")=true
}

script ::= ((assignment_expression | function_call_expression | super_call_expression | method_call_expression) ';')*

// Expressions
expression ::= assignment_expression | ternary_expression | or_expression | and_expression | binary_or_expression | binary_xor_expression | binary_and_expression | equality_expression
    | comparison_expression | addition_expression | multiplication_expression | unary_group | accessor_group | literal_expression | paren_expression

fake binary_expression ::= expression operator expression {
    methods=[
        left="/expression[0]"
        right="/expression[1]"
    ]
}

assignment_expression ::= expression <<operator ('=' | '+=' | '-=' | '*=' | '/=' | '|=' | '&=')>> expression{ elementType=binary_expression }
ternary_expression ::= expression '?' expression ':' expression {
    methods=[
        condition="/expression[0]"
        trueClause="/expression[1]"
        falseClause="/expression[2]"
    ]
}
or_expression ::= expression <<operator '||'>> expression{ elementType=binary_expression }
and_expression ::= expression <<operator '&&'>> expression{ elementType=binary_expression }
binary_or_expression ::= expression <<operator '|'>> expression{ elementType=binary_expression }
binary_xor_expression ::= expression <<operator '^'>> expression{ elementType=binary_expression }
binary_and_expression ::= expression <<operator '&'>> expression{ elementType=binary_expression }
equality_expression ::= expression <<operator ('==' | '!=')>> expression{ elementType=binary_expression }
comparison_expression ::= expression <<operator ('<' | '<=' | '>' | '>=')>> expression{ elementType=binary_expression }
addition_expression ::= expression <<operator ('+' | '-')>> expression{ elementType=binary_expression }
multiplication_expression ::= expression <<operator ('*' | '/' | '%')>> expression{ elementType=binary_expression }

private unary_group ::= unary_expression | new_expression | type_cast_expression
unary_expression ::= <<operator ('-' | '~' | '!')>> expression
new_expression ::= new identifier in expression
type_cast_expression ::= ('(' type ')' expression)

private accessor_group ::= function_call_expression | super_call_expression | method_call_expression | element_accessor_expression | field_accessor_expression
function_call_expression ::= identifier arguments
super_call_expression ::= super '.' identifier arguments
method_call_expression ::= expression '.' identifier arguments
element_accessor_expression ::= expression index {
    methods=[
        array="/expression[0]"
        index="/expression[1]"
    ]
}
field_accessor_expression ::= expression '.' identifier

literal_expression ::= this | INT_LITERAL | FLOAT_LITERAL | BOOL_LITERAL | STRING_LITERAL | NAME_LITERAL | NULL_LITERAL | identifier {
    methods=[
        this="this"
    ]
}
paren_expression ::= '(' expression ')'

// Types
type ::= primitive_type | array_type | custom_type
primitive_type ::= int | float | bool | string | name
array_type ::= array '<' type '>'
custom_type ::= identifier

// Misc
private arguments ::= '(' ')' | '(' argument (',' argument)* ')'
argument ::= expression?
private index ::= '[' expression ']'
meta operator ::= <<p>>
identifier ::= id

In this example as can be seen, the file is expected to have zero or more assignment/super-call/function-call/method-call expressions each followed by a semicolon. However, this is not what grammar-kit is generating. Instead, in the parser code I can see:

// assignment_expression | function_call_expression | super_call_expression | method_call_expression
  private static boolean script_0_0(PsiBuilder builder_, int level_) {
    if (!recursion_guard_(builder_, level_, "script_0_0")) return false;
    boolean result_;
    result_ = expression(builder_, level_ + 1, -1);
    if (!result_) result_ = function_call_expression(builder_, level_ + 1);
    if (!result_) result_ = super_call_expression(builder_, level_ + 1);
    if (!result_) result_ = expression(builder_, level_ + 1, 11);
    return result_;
  }

For whatever reason instead of having separate methods for assignment/method-call expressions it uses the generic 'expression' (yet somehow it does generate methods for function/super-call) and it results in behavior which is explicitly defined by the grammar as incorrect, i.e. it sees for example '2;' and '2+2;' as valid statements.

marty-friedman commented 4 months ago

Is it related to some left recursion magic that is going on with these expressions, so that it won't let me use any particular kind of expression directly, instead always replacing it with the parent 'expression'. In that case, is the only way to achieve the expected behavior to implement some kind of inspection after the parsing takes place to explicitly check the type of expression at the root level?

gregsh commented 4 months ago

Is it related to some left recursion magic that is going on with these expressions, so that it won't let me use any particular kind of expression directly

It's related to Compact expression parsing with priorities. A special parser generator mode optimized for expressions