antlr / grammars-v4

Grammars written for ANTLR v4; expectation that the grammars are free of actions.
MIT License
10.15k stars 3.7k forks source link

Lua parser: incorrect parsing of function calls? #1824

Closed stevenjohnstone closed 1 year ago

stevenjohnstone commented 4 years ago

Parsing the following with the test rig

a = funcA()

gives

(chunk (block (stat (varlist (var a)) = (explist (exp (prefixexp (varOrExp (var funcA)) (nameAndArgs (args ( )))))))) <EOF>)

I'd expect funcA() to be parsed as a function call.

Function calls without return values in a "stat" are parsed correctly e.g.

funcA()

has the following tree:

(chunk (block (stat (functioncall (varOrExp (var funcA)) (nameAndArgs (args ( )))))) <EOF>)

Function calls in a "retstat" are also not parsed as function calls:

return {
    funcA()
}

has the following tree:

(chunk (block (retstat return (explist (exp (tableconstructor { (fieldlist (field (exp (prefixexp (varOrExp (var funcA)) (nameAndArgs (args ( ))))))) }))))) <EOF>)

Adding "functioncall" as one of the alternatives for "exp" seems to fix the problem e.g.

diff --git a/lua/Lua.g4 b/lua/Lua.g4
index 4230fe05..657505a6 100644
--- a/lua/Lua.g4
+++ b/lua/Lua.g4
@@ -112,6 +112,7 @@ exp
     | string
     | '...'
     | functiondef
+    | functioncall
     | prefixexp
     | tableconstructor
     | <assoc=right> exp operatorPower exp
britzl commented 3 years ago

I agree with @stevenjohnstone . I got unexpected results when using this grammar. I didn't get function calls where I expected them to be, making the current grammar pretty hard to work with.

The proposed change makes it easy to parse and identify function calls.

britzl commented 3 years ago

I've come across another problem related to function calls:

local a = foo().bar

The function call foo() is not detected. I'm not comfortable enough with Antlr grammar to solve it myself. @stevenjohnstone perhaps you have any idea?

britzl commented 3 years ago

This one as well. It seems like it is including too much in the scope of the function call:

foo("bar")

"hello"

antlr4_parse_tree

stevenjohnstone commented 3 years ago

This one as well. It seems like it is including too much in the scope of the function call:

foo("bar")

"hello"

antlr4_parse_tree

I think in Lua you can call a function without using parentheses. So the above is in sane notation:

foo("bar")("hello")

This program

local foo = function(arg) print(arg) return print end
foo("bar")
-- meaningless white space
"hello"

has output

bar
hello

which is similar to the example above.

I've come across another problem related to function calls:

local a = foo().bar

The function call foo() is not detected. I'm not comfortable enough with Antlr grammar to solve it myself. @stevenjohnstone perhaps you have any idea?

I suspect that "foo()" will be interpreted as a variable. Looking at the language specification, I'm not quite sure that variables are treated correctly here. I have a grammar I maintain over at https://github.com/stevenjohnstone/lua-grammar. If this is a problem with that version, I'd be happy to take a look at an issue there @britzl.

britzl commented 3 years ago

@stevenjohnstone You are absolutely right about the case of calling a function without parenthesis if it takes a single argument. I did not think of that when I observed the output from my example!

But the other case local a = foo().bar remains a problem. I'll try your grammar and return back here. Thanks Steven!

stevenjohnstone commented 3 years ago

@stevenjohnstone You are absolutely right about the case of calling a function without parenthesis if it takes a single argument. I did not think of that when I observed the output from my example!

I was almost right 🤦 The diagram isn't clear in darkmode but it appears on closer inspection that your example is being parsed as

foo("bar", "hello")

instead of

foo("bar")("hello")

So it looks to me like you've found another bug @britzl

britzl commented 3 years ago

Oh, yes, you're right! So it is a bug then!

And I wasn't able to get your grammar to detect the function call here: local a = foo().bar

kaby76 commented 1 year ago

There are a few problems reported here.

First, we note that input foo "hi" "there" where there are a number of newlines between "hi" and "there" is perfectly acceptable. The parse tree should contain two functioncall because function calls without parentheses is for one argument only, the result of the function call to foo being used in the second function call.

It is important to note that in an "interactive" shell of lua.exe, the interpreter rewrites every line inputted into "return " + line + " ;", so it will not work as "non-interactive" mode. It does the rewrite in lua.c of the Lua source code.

The second problem is that the grammar in the manual (at the end of this webpage) contains indirect left recursion involving var, functioncall, and prefixexp. Unfortunately, the person who refactored the rules did so by hand, and did not do so rigorously. This is why I always advocate scraping and refactoring by machine (using Trash).

That said: the grammar in the manual is not the same as the grammar implemented in the source code of the parser!. I am planning at some future date to redo the EBNF and report to lua.org that the grammar in the doc is not the same as the source.

I rewrote the rules using the following 8-step rewrite. The resulting grammar should work as desired. I have a PR for this.

Original rules from manual

var ::=  Name | prefixexp '[' exp ']' | prefixexp '.' Name 
prefixexp ::= var | functioncall | '(' exp ')'
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> group within var

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::= var | functioncall | '(' exp ')'
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> unfold var within prefixexp

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::=  (Name | prefixexp ('[' exp ']' | '.' Name)) | functioncall | '(' exp ')'
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> ungroup within prefixexp

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::= Name | prefixexp ('[' exp ']' | '.' Name) | functioncall | '(' exp ')'
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> reorder alts within prefixexp

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::=  prefixexp ('[' exp ']' | '.' Name) | Name | functioncall | '(' exp ')'
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> kleene operator rewrite within prefixexp

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::=  (Name | functioncall | '(' exp ')') ('[' exp ']' | '.' Name)*
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> ungroup within prefixexp

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::= 
 Name ('[' exp ']' | '.' Name)*
 | functioncall ('[' exp ']' | '.' Name)*
 | '(' exp ')' ('[' exp ']' | '.' Name)*
functioncall ::=  prefixexp args | prefixexp ':' Name args;

=> unfold prefixexp into functioncall

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::= 
 Name ('[' exp ']' | '.' Name)*
 | functioncall ('[' exp ']' | '.' Name)*
 | '(' exp ')' ('[' exp ']' | '.' Name)*
functioncall ::=
 (Name ('[' exp ']' | '.' Name)*
 | functioncall ('[' exp ']' | '.' Name)*
 | '(' exp ')' ('[' exp ']' | '.' Name)*) args
 |
 (Name ('[' exp ']' | '.' Name)*
 | functioncall ('[' exp ']' | '.' Name)*
 | '(' exp ')' ('[' exp ']' | '.' Name)*) ':' Name args

=> ungroup with functioncall

var ::= Name | prefixexp ('[' exp ']' | '.' Name)
prefixexp ::= 
 Name ('[' exp ']' | '.' Name)*
 | functioncall ('[' exp ']' | '.' Name)*
 | '(' exp ')' ('[' exp ']' | '.' Name)*
functioncall ::=
 Name ('[' exp ']' | '.' Name)* args
 | functioncall ('[' exp ']' | '.' Name)* args
 | '(' exp ')' ('[' exp ']' | '.' Name)* args
 | Name ('[' exp ']' | '.' Name)* ':' Name args
 | functioncall ('[' exp ']' | '.' Name)* ':' Name args
 | '(' exp ')' ('[' exp ']' | '.' Name)* ':' Name args