dalance / sv-parser

SystemVerilog parser library fully compliant with IEEE 1800-2017
Other
409 stars 53 forks source link

Operation precedence not being followed when creating parse tree #62

Open amitaiirron opened 2 years ago

amitaiirron commented 2 years ago

Maybe I'm wrong to expect this, but shouldn't the parse-tree being generated respect System Verilog operation precedence?

I'm using sc-parser visa py-sv-parser 0.3.0. I'm trying to parse the simple module:

module _non();

assign a = 2 * 8 - 1;

endmodule

(never mind, I think, that a is not defined here, as I'm just trying to get the parse tree for the RHS expression - 2 * 8 - 1).

The parse tree (of the RHS expression only, not the entire module) looks like this:

+-------------------------------------------------+-------------+
| Node                                            | Text        |
+-------------------------------------------------+-------------+
| +- Expression                                   | '2 * 8 - 1' |
|    +- ExpressionBinary                          | '2 * 8 - 1' |
|       +- Expression                             | '2'         |
|       |  +- Primary                             | '2'         |
|       |     +- PrimaryLiteral                   | '2'         |
|       |        +- Number                        | '2'         |
|       |           +- IntegralNumber             | '2'         |
|       |              +- DecimalNumber           | '2'         |
|       |                 +- UnsignedNumber       | '2'         |
|       |                    +- Locate            | '2'         |
|       |                    +- WhiteSpace        | ''          |
|       |                       +- Locate         | ''          |
|       +- BinaryOperator                         | '*'         |
|       |  +- Symbol                              | '*'         |
|       |     +- Locate                           | '*'         |
|       |     +- WhiteSpace                       | ''          |
|       |        +- Locate                        | ''          |
|       +- Expression                             | '8 - 1'     |
|          +- ExpressionBinary                    | '8 - 1'     |
|             +- Expression                       | '8'         |
|             |  +- Primary                       | '8'         |
|             |     +- PrimaryLiteral             | '8'         |
|             |        +- Number                  | '8'         |
|             |           +- IntegralNumber       | '8'         |
|             |              +- DecimalNumber     | '8'         |
|             |                 +- UnsignedNumber | '8'         |
|             |                    +- Locate      | '8'         |
|             |                    +- WhiteSpace  | ''          |
|             |                       +- Locate   | ''          |
|             +- BinaryOperator                   | '-'         |
|             |  +- Symbol                        | '-'         |
|             |     +- Locate                     | '-'         |
|             |     +- WhiteSpace                 | ''          |
|             |        +- Locate                  | ''          |
|             +- Expression                       | '1'         |
|                +- Primary                       | '1'         |
|                   +- PrimaryLiteral             | '1'         |
|                      +- Number                  | '1'         |
|                         +- IntegralNumber       | '1'         |
|                            +- DecimalNumber     | '1'         |
|                               +- UnsignedNumber | '1'         |
|                                  +- Locate      | '1'         |
+-------------------------------------------------+-------------+

You can see that the top binary operator (lowest precedence) is the *, and the - operator appears in a mode internal tree node (i.e., higher precedence). This is contrary to the SystemVerilog standard (IEEE P1800™/D6), which specifies that * should be higher precedence than - (section 11.3.2).

I tested this with several pairs of operators, and it seems that the parser consistently evaluates from left to right, ignoring the precedence. If I use parenthesis, I can enforce precedence on the generated parse tree.

DaveMcEwan commented 2 years ago

Ah unfortunately not. The language is defined in BNF, but operator precedence is not part of this. See the bottom of IEEE1800-2017, page 1172. The relevant part is this: constant_expression ::= ... | constant_expression binary_operator { attribute_instance } constant_expression | ... You can see on page 1175 that binary_operator includes +, -, *, / and the rest.

The effect is that the parser doesn't know or care about precedence, and leaves those semantics to higher level (something consuming the AST).

philipaxer commented 1 year ago

Hi Dave, I agree that BNF definition doesnt give the precence away. But the spec still gives the precedence relations (11.3.2). I would find it logical that a parser obeys this (which is certainly possible see Pratt parsing and friends). It is really a big stretch to reshuffle the AST to fix this later. regards Philip

amitaiirron commented 1 year ago

I ended up fixing this in my evaluator.

In addition to the issue of precedence among binary operators, I later discovered that binary operators do not even have precedence over the trinary operator (":). Effectively, when I write the expression:

assign y = c == 1 ? t : e;

System Verilog semantics dictate that this is to be interpreted as:

assign y = (c == 1) ? t : e;

But the parser, instead, does this:

assign y = c == (1 ? t : e);

I resolved this similarly to how I prioritized binary operators, but that was even less elegant than the original solution.

I now understand why some System Verilog static checkers insist that parentheses are used EVERYWHERE.

DaveMcEwan commented 1 year ago

Hi Dave, I agree that BNF definition doesnt give the precence away. But the spec still gives the precedence relations (11.3.2). I would find it logical that a parser obeys this

I get what you mean, but this is the distinction between syntax and semantics. Currently, sv-parser only implements syntax analysis (no semantic analysis), i.e. a fairly straightforward translation from ABNF to Rust. IMO, it would be great to have functionality to implement the semantics, but that would probably be a different Rust crate. Last year, I supervised an intern project on a PoC implementation of the semantics of module ports svdata, so that could be a starting point...

(which is certainly possible see Pratt parsing and friends).

I hadn't heard of this. Thanks for the reading material :)

philipaxer commented 1 year ago

Thanks for the super quick reply. I dont have much parser/compiler background but I dont think an AST with incorrect precendes is too useful. All crates that i have seen that use sv-parser essentially filter for nodes (NodeEnter/Leave) and then deduct semantics based on this traversal. This will lead you to a false path when it comes to expressions. IMO, there is also no trivial way to fix this. The only way to fix this is to re-tokenize the tree and parse using "correct" rules. The svdata crate doesnt (currently) suffer from the issue since it never looks at expressions.

regards Philip

dalance commented 1 year ago

I agree that parser generally should treat operator precedence. But the missing of operator precedence is intentional decision.

At the early phase of development, I tried to add operator precedence. But SystemVerilog BNF is very complicated ( especially around expression ). So I thought that I can't implement all BNF with correct precedence, and there was two choice. One is all BNF without precedence, another is a parts of BNF with precedence. I chose the former because there was already some incomplete SystemVerilog parser implementations and I thought my work should aim at complete.

Re-ordering syntax tree is surely difficult. But I think adding precedence to sv-parser is more difficult. Of course, if there is easy implementation, I'll accept it.

amitaiirron commented 1 year ago

Another option is to get the BNF in the standard to change such that priorities are reflected. Prioritizing all binary expressions would really add significant complexity, but just prioritizing conditionals under binaries should be a small enough change.

philipaxer commented 1 year ago

As part of my day job, I am part of some standardization bodies. In my opinion "quickly" changing a standard that has been out there for some time is next to impossible. Especially since the spec is unambiguous. The only reason to ammend it would be if there are inconsistencies. Easier to work around it :)

amitaiirron commented 1 year ago

As part of my day job, I am part of some standardization bodies. In my opinion "quickly" changing a standard that has been out there for some time is next to impossible. Especially since the spec is unambiguous. The only reason to ammend it would be if there are inconsistencies. Easier to work around it :)

I certainly respect this point of view. I do not participate in any standardization body (and don't envy those who do, at least not too much). From my perspective as a software developer who has to work with multiple standards, seeing the mountains of code and effort that accumulate when compensating for shortcuts in standards, fixing the standard would be my preferred approach. Of course, I am not claiming to be right, just presenting a point of view.

Anecdotally, I can testify that I have put days into compensating for this particular shortcut in the standard. I maintain a small body of System Verilog code, that is in use by many people. Originally, expressions in the code was written without parentheses where operator precedence resolved priorities, and complexity was not too high. This was working well, until our SV lint tool was upgraded, and started demanding parentheses everywhere, including in expressions like a + b * c. From what I learned in this exchange, I can guess the reason this check: It was probably included was as a precaution against hypothetical tools that may fail to properly re-prioritize operations, and used a parse tree assuming it handled priorities. I have not seen such a tool in our production flows, but the linter insists, so I had to go add parentheses in hundreds of lines of code.