boolangery / py-lua-parser

A Lua parser and AST builder written in Python.
MIT License
112 stars 36 forks source link

Bitwise precedence is wrong #33

Open fabi321 opened 1 year ago

fabi321 commented 1 year ago

Looking at parser/Lua.g4, it seems that the assumed bitwise priority is above multiplication, while it is in fact below concat reference on operator precedence. This is also the case in a small example. In a lua 5.4 console:

> 1&2+3
1
> (1&2)+3
3

The parsed tree for "1&2+3":

AddOp: {} 3 keys
  left: {} 3 keys
    BAndOp: {} 3 keys
      left: {} 2 keys
        Number: {} 2 keys
          n: 1
      right: {} 2 keys
        Number: {} 2 keys
          n: 2
  right: {} 2 keys
    Number: {} 2 keys
      n: 3
boolangery commented 1 year ago

Another issue for @alkino !?

alkino commented 1 year ago

@boolangery why do you create your own parser (named builder) instead of using the one from ANTLR.

Because, right now even if I modified the Lua.g4, I steal need to rewrite a boring part of builder.py.

boolangery commented 1 year ago

I don't know, It was done like this and I don't want to rewrite all :). I could do the builder.py part

Real-Gecko commented 2 months ago

I think problem is a little bit deeper than bitwise precedence, as parser lacks Brackets statement: Code:

from luaparser import ast

source = """
if (a + 1) / 2 == 3 then
    print('ok')
end
"""

tree = ast.parse(source)
result = ast.to_lua_source(tree)
print(result)

Output:

if a + 1 / 2 == 3 then
    print('ok')
end

As you can see brackets are lost leading to wrong output.

fabi321 commented 2 months ago

@Real-Gecko That is a separate issue. The precedence in an AST is encoded in the way how the statements are nested, so it is possible to reconstruct correct code without bracket information.

However, this issue deals with the fact that the source -> AST mapping is wrong, the AST -> source implementation is independent.

An example from me where I implemented a formatter that produces correct source code just from AST: https://github.com/stormworks-utils/tumfl/blob/main/tumfl/formatter.py#L262

Real-Gecko commented 2 months ago

Cool and what about brackets themselves? I modified code locally so I have a Brackets statement which I can insert into AST and get the proper code (as that's the only thing I need). But I failed to implement it into parser.

fabi321 commented 2 months ago

I don't understand, why you'd need brackets to be encoded in AST. Take your example:

_ = (a + 1) / 2 == 3

Removing the block and assign, this yields the following AST:

REqOp: {} 3 keys
  left: {} 3 keys
    FloatDivOp: {} 3 keys
      left: {} 3 keys
        AddOp: {} 3 keys
          left: {} 2 keys
            Name: {} 2 keys
              id: 'a'
          right: {} 2 keys
            Number: {} 2 keys
              n: 1
      right: {} 2 keys
        Number: {} 2 keys
          n: 2
  right: {} 2 keys
    Number: {} 2 keys
      n: 3

From this AST, you can clearly see, that the FloatDivOp is the outer operation, and that the AddOp is the inner operation. This is clearly defined and requires no Brackets type to encode the precedence. This also means, that the code formatter would need to account for the possibility, that an inner operation might have a lower precedence compared to an outer operation. I can only think of one use case that would require bracket information, and that is pretty printers, as they ideally don't modify the semantics of your code, but only the style.

Real-Gecko commented 2 months ago

Take a look at the example once more, despite AST being proper the output is wrong:

if a + 1 / 2 == 3 then
    print('ok')
end

So maybe you don't need to parse brackets in AST, but you do need them in output source code.

What's initial AST of this result?

a + b / c + d

a. image

b. image c. image d. image

fabi321 commented 2 months ago

I am not denying that the output of ast.to_lua_source is wrong. However, this issue is concerned with the source -> AST conversion being wrong for bitwise operations, so a bug in ast.parser, your issue is concerned with the AST -> source conversion, so a bug in ast.to_lua_source. That's why you should open a new issue for this, as they are not related with each other. As for your second example: the AST looks something like this: (a + (b / c)) + d

Real-Gecko commented 2 months ago

As for your second example: the AST looks something like this: (a + (b / c)) + d

Currently you cannot say for sure.

fabi321 commented 2 months ago

As for your second example: the AST looks something like this: (a + (b / c)) + d

Currently you cannot say for sure.

I can say that for sure, as I've looked at the ast. This was only a simplified representation of the AST, the full one looks like this:

AddOp: {} 3 keys
  left: {} 3 keys
    AddOp: {} 3 keys
      left: {} 2 keys
        Name: {} 2 keys
          id: 'a'
      right: {} 3 keys
        FloatDivOp: {} 3 keys
          left: {} 2 keys
            Name: {} 2 keys
              id: 'b'
          right: {} 2 keys
            Name: {} 2 keys
              id: 'c'
  right: {} 2 keys
    Name: {} 2 keys
      id: 'd'
Real-Gecko commented 2 months ago

I can say that for sure, as I've looked at the ast. This was only a simplified representation of the AST, the full one looks like this:

It's not the input I'm referring to but output. AST->source is a + b / c + d what was the source before source -> AST?

fabi321 commented 2 months ago

It's not the input I'm referring to but output. AST->source is a + b / c + d what was the source before source -> AST?

I don't understand what you want. The chain goes as follows:

_ = a + b / c + d

The source input

AddOp: {} 3 keys
  left: {} 3 keys
    AddOp: {} 3 keys
      left: {} 2 keys
        Name: {} 2 keys
          id: 'a'
      right: {} 3 keys
        FloatDivOp: {} 3 keys
          left: {} 2 keys
            Name: {} 2 keys
              id: 'b'
          right: {} 2 keys
            Name: {} 2 keys
              id: 'c'
  right: {} 2 keys
    Name: {} 2 keys
      id: 'd'

The parsed AST, generated via ast.parse (and printed via ast.to_pretty_str). This result is correct for your use case, only wrong for bitwise (AFAIK). This conversion is topic of this issue.

_ = a + b / c + d

The output, generated via ast.to_lua_source. This is the part you are having troubles with.

There are no other steps to it (that are somewhat visible to the user). I have no clue, what else you want to see.

Real-Gecko commented 2 months ago

Code:

from luaparser import ast

source = """
result = (a + b) / (c + d)
"""

tree = ast.parse(source)
print(ast.to_lua_source(tree))

Output:

result = a + b / c + d

So, by looking at output you still say that initial AST was:

result = (a + (b / c) + d)

?

fabi321 commented 2 months ago

The AST for your example is correct, and looks like the following:

FloatDivOp: {} 3 keys
  left: {} 3 keys
    AddOp: {} 3 keys
      left: {} 2 keys
        Name: {} 2 keys
          id: 'a'
      right: {} 2 keys
        Name: {} 2 keys
          id: 'b'
  right: {} 3 keys
    AddOp: {} 3 keys
      left: {} 2 keys
        Name: {} 2 keys
          id: 'c'
      right: {} 2 keys
        Name: {} 2 keys
          id: 'd'

However, ast.to_lua_source seems to completely ignore precedences. That is the issue you're facing. It's not that any initial AST is wrong, it's that ast.to_lua_source doesn't account for the fact, that there are operations with different precedences, which may need brackets in order to maintain the same meaning as the AST. Again: I'm not denying that ast.to_lua_source is wrong, all I'm saying is that it's unrelated to this particular issue, as it isn't a bug in ast.parse.

Real-Gecko commented 2 months ago

Yeah, that's why I created #57