antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

VBA grammar order of operations #3334

Closed Xenios91 closed 9 months ago

Xenios91 commented 1 year ago

Order of operations does not appear to function correctly.

For example, If i present it with x = 10 - 5 + 1 it will parse addition first, where in this case it should read from left to right.

msagca commented 1 year ago

@Xenios91

In the grammar, addition has higher precedence than subtraction. Operators with the same precedence should be grouped together. And the whole whitespace thing looks weird to me.

valueStmt
: literal
| implicitCallStmt_InStmt
| LPAREN WS? valueStmt (WS? ',' WS? valueStmt)* RPAREN
| NEW WS? valueStmt
| typeOfStmt
| midStmt
| ADDRESSOF WS? valueStmt
| implicitCallStmt_InStmt WS? ASSIGN WS? valueStmt
| valueStmt WS? IS WS? valueStmt
| valueStmt WS? LIKE WS? valueStmt
| valueStmt WS? GEQ WS? valueStmt
| valueStmt WS? LEQ WS? valueStmt
| valueStmt WS? GT WS? valueStmt
| valueStmt WS? LT WS? valueStmt
| valueStmt WS? NEQ WS? valueStmt
| valueStmt WS? EQ WS? valueStmt
| valueStmt WS? POW WS? valueStmt
| MINUS WS? valueStmt
| PLUS WS? valueStmt
| valueStmt WS? DIV WS? valueStmt
| valueStmt WS? MULT WS? valueStmt
| valueStmt WS? MOD WS? valueStmt
| valueStmt WS? PLUS WS? valueStmt
| valueStmt WS? MINUS WS? valueStmt
| valueStmt WS? AMPERSAND WS? valueStmt
| valueStmt WS? IMP WS? valueStmt
| valueStmt WS? EQV WS? valueStmt
| valueStmt WS? XOR WS? valueStmt
| valueStmt WS? OR WS? valueStmt
| valueStmt WS? AND WS? valueStmt
| NOT WS? valueStmt
;
Xenios91 commented 1 year ago

@Xenios91

In the grammar, addition has higher precedence than subtraction. Operators with the same precedence should be grouped together. And the whole whitespace thing looks weird to me.

valueStmt
: literal
| implicitCallStmt_InStmt
| LPAREN WS? valueStmt (WS? ',' WS? valueStmt)* RPAREN
| NEW WS? valueStmt
| typeOfStmt
| midStmt
| ADDRESSOF WS? valueStmt
| implicitCallStmt_InStmt WS? ASSIGN WS? valueStmt
| valueStmt WS? IS WS? valueStmt
| valueStmt WS? LIKE WS? valueStmt
| valueStmt WS? GEQ WS? valueStmt
| valueStmt WS? LEQ WS? valueStmt
| valueStmt WS? GT WS? valueStmt
| valueStmt WS? LT WS? valueStmt
| valueStmt WS? NEQ WS? valueStmt
| valueStmt WS? EQ WS? valueStmt
| valueStmt WS? POW WS? valueStmt
| MINUS WS? valueStmt
| PLUS WS? valueStmt
| valueStmt WS? DIV WS? valueStmt
| valueStmt WS? MULT WS? valueStmt
| valueStmt WS? MOD WS? valueStmt
| valueStmt WS? PLUS WS? valueStmt
| valueStmt WS? MINUS WS? valueStmt
| valueStmt WS? AMPERSAND WS? valueStmt
| valueStmt WS? IMP WS? valueStmt
| valueStmt WS? EQV WS? valueStmt
| valueStmt WS? XOR WS? valueStmt
| valueStmt WS? OR WS? valueStmt
| valueStmt WS? AND WS? valueStmt
| NOT WS? valueStmt
;

Correct, I am new to ANTLR4 grammars so I am looking into how to resolve it, but I noticed this last night when writing a transpiler with the VBA grammar.

msagca commented 1 year ago

@Xenios91

You can group the operators like this:

valueStmt
: valueStmt (DIV | MULT) valueStmt
| valueStmt (PLUS | MINUS) valueStmt
;

And I think you can get rid of every instance of WS?, it's the lexer's job to remove whitespace characters from the input stream. Edit: The author did not put WS tokens on a separate channel, hence we cannot remove them from the parser rules.

WS : ([ \t] | LINE_CONTINUATION)+;

This rule is usually followed by -> channel(HIDDEN) to separate white space character tokens from the others. I'm not familiar with the language so I can't say why it was omitted.

Operator precedence from the spec: Category Operators
Primary All expressions not explicitly listed in this table
Exponentiation ^
Unary negation -
Multiplicative *, /
Integer division \
Modulus Mod
Additive +, -
Concatenation &
Relational =, <>, <, >, <=, >=, Like, Is
Logical NOT Not
Logical AND And
Logical OR Or
Logical XOR Xor
Logical EQV Eqv
Logical IMP Imp
Xenios91 commented 1 year ago

@Xenios91

You can group the operators like this:

valueStmt
: valueStmt (DIV | MULT) valueStmt
| valueStmt (PLUS | MINUS) valueStmt
;

And I think you can get rid of every instance of WS?, it's the lexer's job to remove whitespace characters from the input stream. Edit: The author did not put WS tokens on a separate channel, hence we cannot remove them from the parser rules.

WS : ([ \t] | LINE_CONTINUATION)+;

This rule is usually followed by -> channel(HIDDEN) to separate white space character tokens from the others. I'm not familiar with the language so I can't say why it was omitted.

Operator precedence from the spec: Category Operators Primary All expressions not explicitly listed in this table Exponentiation ^ Unary negation - Multiplicative *, / Integer division | Modulus Mod Additive +, - Concatenation & Relational =, <>, <, >, <=, >=, Like, Is Logical NOT Not Logical AND And Logical OR Or Logical XOR Xor Logical EQV Eqv Logical IMP Imp

I will make some changes and send them this way when I have time, let me know what you think.

Xenios91 commented 1 year ago

This should hopefully fix it, I left the WS stuff there, I dont want to accidentally break something. But I am using this as a transpiler and so far the stack based machine code its generating looks better now, following order of operations correctly.

https://github.com/antlr/grammars-v4/pull/3336

Beakerboy commented 9 months ago

This is still labeled as open. The related pull request has been merged.