VBA grammar order of operations

Xenios91 commented 1 year ago

Order of operations does not appear to function correctly.

For example, If i present it with x = 10 - 5 + 1 it will parse addition first, where in this case it should read from left to right.

msagca commented 1 year ago

@Xenios91

In the grammar, addition has higher precedence than subtraction. Operators with the same precedence should be grouped together. And the whole whitespace thing looks weird to me.

valueStmt
: literal
| implicitCallStmt_InStmt
| LPAREN WS? valueStmt (WS? ',' WS? valueStmt)* RPAREN
| NEW WS? valueStmt
| typeOfStmt
| midStmt
| ADDRESSOF WS? valueStmt
| implicitCallStmt_InStmt WS? ASSIGN WS? valueStmt
| valueStmt WS? IS WS? valueStmt
| valueStmt WS? LIKE WS? valueStmt
| valueStmt WS? GEQ WS? valueStmt
| valueStmt WS? LEQ WS? valueStmt
| valueStmt WS? GT WS? valueStmt
| valueStmt WS? LT WS? valueStmt
| valueStmt WS? NEQ WS? valueStmt
| valueStmt WS? EQ WS? valueStmt
| valueStmt WS? POW WS? valueStmt
| MINUS WS? valueStmt
| PLUS WS? valueStmt
| valueStmt WS? DIV WS? valueStmt
| valueStmt WS? MULT WS? valueStmt
| valueStmt WS? MOD WS? valueStmt
| valueStmt WS? PLUS WS? valueStmt
| valueStmt WS? MINUS WS? valueStmt
| valueStmt WS? AMPERSAND WS? valueStmt
| valueStmt WS? IMP WS? valueStmt
| valueStmt WS? EQV WS? valueStmt
| valueStmt WS? XOR WS? valueStmt
| valueStmt WS? OR WS? valueStmt
| valueStmt WS? AND WS? valueStmt
| NOT WS? valueStmt
;

Xenios91 commented 1 year ago

@Xenios91

In the grammar, addition has higher precedence than subtraction. Operators with the same precedence should be grouped together. And the whole whitespace thing looks weird to me.

valueStmt
: literal
| implicitCallStmt_InStmt
| LPAREN WS? valueStmt (WS? ',' WS? valueStmt)* RPAREN
| NEW WS? valueStmt
| typeOfStmt
| midStmt
| ADDRESSOF WS? valueStmt
| implicitCallStmt_InStmt WS? ASSIGN WS? valueStmt
| valueStmt WS? IS WS? valueStmt
| valueStmt WS? LIKE WS? valueStmt
| valueStmt WS? GEQ WS? valueStmt
| valueStmt WS? LEQ WS? valueStmt
| valueStmt WS? GT WS? valueStmt
| valueStmt WS? LT WS? valueStmt
| valueStmt WS? NEQ WS? valueStmt
| valueStmt WS? EQ WS? valueStmt
| valueStmt WS? POW WS? valueStmt
| MINUS WS? valueStmt
| PLUS WS? valueStmt
| valueStmt WS? DIV WS? valueStmt
| valueStmt WS? MULT WS? valueStmt
| valueStmt WS? MOD WS? valueStmt
| valueStmt WS? PLUS WS? valueStmt
| valueStmt WS? MINUS WS? valueStmt
| valueStmt WS? AMPERSAND WS? valueStmt
| valueStmt WS? IMP WS? valueStmt
| valueStmt WS? EQV WS? valueStmt
| valueStmt WS? XOR WS? valueStmt
| valueStmt WS? OR WS? valueStmt
| valueStmt WS? AND WS? valueStmt
| NOT WS? valueStmt
;

Correct, I am new to ANTLR4 grammars so I am looking into how to resolve it, but I noticed this last night when writing a transpiler with the VBA grammar.

msagca commented 1 year ago

@Xenios91

You can group the operators like this:

valueStmt
: valueStmt (DIV | MULT) valueStmt
| valueStmt (PLUS | MINUS) valueStmt
;

And I think you can get rid of every instance of WS?, it's the lexer's job to remove whitespace characters from the input stream. Edit: The author did not put WS tokens on a separate channel, hence we cannot remove them from the parser rules.

WS : ([ \t] | LINE_CONTINUATION)+;

This rule is usually followed by -> channel(HIDDEN) to separate white space character tokens from the others. I'm not familiar with the language so I can't say why it was omitted.

Operator precedence from the spec:	Category	Operators
Primary	All expressions not explicitly listed in this table
Exponentiation	^
Unary negation	-
Multiplicative	*, /
Integer division	\
Modulus	Mod
Additive	+, -
Concatenation	&
Relational	=, <>, <, >, <=, >=, Like, Is
Logical NOT	Not
Logical AND	And
Logical OR	Or
Logical XOR	Xor
Logical EQV	Eqv
Logical IMP	Imp

Xenios91 commented 1 year ago

@Xenios91

You can group the operators like this:
valueStmt
: valueStmt (DIV | MULT) valueStmt
| valueStmt (PLUS | MINUS) valueStmt
;
And I think you can get rid of every instance of WS?, it's the lexer's job to remove whitespace characters from the input stream. Edit: The author did not put WS tokens on a separate channel, hence we cannot remove them from the parser rules.
WS : ([ \t] | LINE_CONTINUATION)+;
This rule is usually followed by -> channel(HIDDEN) to separate white space character tokens from the others. I'm not familiar with the language so I can't say why it was omitted.

Operator precedence from the spec: Category Operators Primary All expressions not explicitly listed in this table Exponentiation ^ Unary negation - Multiplicative *, / Integer division | Modulus Mod Additive +, - Concatenation & Relational =, <>, <, >, <=, >=, Like, Is Logical NOT Not Logical AND And Logical OR Or Logical XOR Xor Logical EQV Eqv Logical IMP Imp

I will make some changes and send them this way when I have time, let me know what you think.

Xenios91 commented 1 year ago

This should hopefully fix it, I left the WS stuff there, I dont want to accidentally break something. But I am using this as a transpiler and so far the stack based machine code its generating looks better now, following order of operations correctly.

https://github.com/antlr/grammars-v4/pull/3336

Beakerboy commented 9 months ago

This is still labeled as open. The related pull request has been merged.

antlr / grammars-v4

VBA grammar order of operations #3334