gleam-lang / tree-sitter-gleam

🌳 A tree-sitter grammar for the Gleam programming language
Apache License 2.0
71 stars 13 forks source link

parse expressions outside of functions #20

Closed the-mikedavis closed 2 years ago

the-mikedavis commented 2 years ago

This PR is actually more of a question than a solution 😅

Parsing expressions in the top-level (under source_file, outside of functions where they would normally be parsed) is advantageous when injecting Gleam, as you might do with markdown. This is pertinent to #14 because as far as I know, GitHub uses the tree-sitter parser for markdown's fenced code blocks. For example in #19, I have a code block

```gleam
<<code:int-size(8)-unit(2), reason:utf8>>
```

Which is not a valid gleam program but is a valid gleam "snippet" so to speak.

What would you think about allowing expressions in the top-level like this?

(I didn't really look at those changed import tests yet, looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.)

J3RN commented 2 years ago

Sorry for my delayed response!

Generally speaking, I'd like to follow the example of the first-party tree-sitter grammars, as they're probably the best "documentation" for tree-sitter grammars. As far as those go, Ruby, JavaScript, and Elixir all actually allow top-level expressions. I think I'll have to look at Rust and/or Go to see if either A) top level expressions are allowed in the language (I don't think so?) and B) if the tree-sitter grammar allows top-level expressions. This is on my to-do list :sweat_smile:

looks like some error nodes made their way into the tests and this PR ends up changing the error behavior.

At some point I thought it was a good idea to test the error cases to ensure that the errors made sense or were helpful. Unfortunately tree-sitter gives us very little control over this, and it's probably a bad idea after all. All this to say, that test (or tests) should be fine to remove :+1:

the-mikedavis commented 2 years ago

I did some looking around and it looks like

For rust it looks like tree-sitter-rust is permissive about top-level expressions where the rust language is not. E.g.

let x = 2 + 3;
fn main() {
    println!("Hello, world!");
}
gives an error ``` error: expected item, found keyword `let` --> src/main.rs:1:1 | 1 | let x = 2 + 3; | ^^^ expected item error: could not compile `foo` due to previous error ```
but is parsed successfully by tree-sitter-rust... ``` (source_file [0, 0] - [4, 0] (let_declaration [0, 0] - [0, 14] pattern: (identifier [0, 4] - [0, 5]) value: (binary_expression [0, 8] - [0, 13] left: (integer_literal [0, 8] - [0, 9]) right: (integer_literal [0, 12] - [0, 13]))) (function_item [1, 0] - [3, 1] name: (identifier [1, 3] - [1, 7]) parameters: (parameters [1, 7] - [1, 9]) body: (block [1, 10] - [3, 1] (expression_statement [2, 4] - [2, 30] (macro_invocation [2, 4] - [2, 29] macro: (identifier [2, 4] - [2, 11]) (token_tree [2, 12] - [2, 29] (string_literal [2, 13] - [2, 28]))))))) ```

I don't have a compiler tool-chain setup for go though so I'm not sure about that one 😅

the-mikedavis commented 2 years ago

I wish there were a way to test that bad syntax errors with more stability. I know tree-sitter-nix recently removed test cases like that because of the churn when updating tree-sitter-cli versions. It's nice not only to show that bad syntax does make an ERROR node but also it can help test how the parser behaves with an incomplete document (iirc that's what tree-sitter-nix was using it for)