github / codeql

CodeQL: the libraries and queries that power security researchers around the world, as well as code scanning in GitHub Advanced Security
https://codeql.github.com
MIT License
7.57k stars 1.52k forks source link

Ruby parser errors on certain lambdas #9313

Open grajagandev opened 2 years ago

grajagandev commented 2 years ago

Description of the issue

I am seeing parser errors on Ruby lambdas. Here is a test case:

$ cat ruby-parser-errors.rb 
def foo
    puts "hi from foo"
end

foo() { boo( & lambda {}) }
foo() { boo( & ->{}) }
$ 
$ ruby -v
ruby 3.0.3p157 (2021-11-24 revision 3fb7d2cadc) [x86_64-linux-gnu]
$
$ ruby ruby-parser-errors.rb
hi from foo
hi from foo
$ 
$ ruby -c ruby-parser-errors.rb
Syntax OK
$ 
$ codeql database create test-db --language=ruby --overwrite
...
[2022-05-24 20:40:17] [build-stdout] ERROR ruby-parser-errors.rb:5: parse error
[2022-05-24 20:40:17] [build-stdout] ERROR ruby-parser-errors.rb:6: parse error: expecting 'identifier'
[2022-05-24 20:40:17] [build-stdout] ERROR ruby-parser-errors.rb:6: missing value for field: binary::left
Finalizing database at .../test-db.
Successfully created database at .../test-db.

Please let me know if you need further information - Thank you

edoardopirovano commented 2 years ago

Greetings, many thanks for reporting this! I've confirmed that the above steps also reproduce the issue for me with our latest development version. I'll pass this along to our Ruby analysis team.

aibaars commented 2 years ago

@grajagandev You're absolutely right this is an unexpected parse error; thanks for reporting!

I think the problem is not really the lambda though. The same parse error happens with other kinds of expressions. I can make the examples work by removing the space after the &. The following parse fine:

foo() { boo( &lambda {}) }
foo() { boo( &->{}) }

The parser we're using is https://github.com/tree-sitter/tree-sitter-ruby. Looking at the scanner it is clear why the parser does not work if there is a space after the &.

Removing that line, however, causes other problems as it is intended to disambiguate between "bitwise-and" and "block argument" expressions. For example:

foo & bar      # a bitwise and of `foo` and `bar`
foo &bar       # a call to `foo` with block argument `bar`

Without the check for spaces, both examples would be parsed as calls to foo with block argument bar.

I created an issue in the upstream repository: https://github.com/tree-sitter/tree-sitter-ruby/issues/218