dotnet / fsharp

The F# compiler, F# core library, F# language service, and F# tooling integration for Visual Studio
https://dotnet.microsoft.com/languages/fsharp
MIT License
3.92k stars 786 forks source link

Generate treesitter grammar #14527

Open vzarytovskii opened 1 year ago

vzarytovskii commented 1 year ago

Is your feature request related to a problem? Please describe.

Currently, more and more tooling and editors are relying on treesitter for navigation, parsing and semantic highlighting (e.g. in-browser VScode, nvim, github,), we should provide TS grammar for F#.

Describe the solution you'd like

TS grammar should be (if possible) generated from our fsl/fsy and hosted in the repo.

Links Treesitter docs: https://tree-sitter.github.io/tree-sitter/ Existing grammars, incl. some ws-sensitive: OCaml: https://github.com/tree-sitter/tree-sitter-ocaml Python: https://github.com/tree-sitter/tree-sitter-python Yaml: https://github.com/ikatyang/tree-sitter-yaml Haskell: https://github.com/tree-sitter/tree-sitter-haskell

ShalokShalom commented 1 year ago

Helix also uses it; exclusively .

auduchinok commented 1 year ago

@vzarytovskii Do you have any thoughts about how that would work with the lex filter? Perhaps, we could look at the Python implementation for the inspiration, as it also has whitespace-sensitive syntax.

vzarytovskii commented 1 year ago

@vzarytovskii Do you have any thoughts about how wold that work with the lex filter? Perhaps, we could look at the Python implementation for the inspiration, as it also have whitespace-sensitive syntax.

Yeah, no specific ideas just yet, probably should figure it out when we'll start working on it.

Eliemer commented 1 year ago

How can someone help to get this started? Interested in contributing

adelarsq commented 1 year ago

@Eliemer These documents has some context on how to proceed https://tree-sitter.github.io/tree-sitter/creating-parsers

NatElkins commented 1 year ago

https://github.com/baronfel/tree-sitter-fsharp

vzarytovskii commented 1 year ago

https://github.com/baronfel/tree-sitter-fsharp

I am aware of this grammar, but if you look at the README, you'll see that it does not cover all language features and whitespace-sensitive aspect.

Generating it from fslexyacc files and lexfilter (if possible of course) has a benefit of having it always up to date when we are updating it with new features.

ShalokShalom commented 1 year ago

On my endeavour to find an ANTLR grammar for F#, I discovered a few things, who might be interesting. First, there are a gazillion similar formats, obviously. 😊

So, I digged deep into this ecosystem and there are all sorts of compiler in every direction, some are more maintained than others.

As an example, I discovered an EBNF <--> Treesitter compiler .

And there is a similar project, that goes only from Treesitter to EBNF, and it shows an already a generated EBNF file for OCaml:

https://github.com/mingodad/plgh/blob/main/tree-sitter-ocaml.ebnf

So, what's obvious, I think, is that EBNF is a considerably easier format, I think.

So, at that point it seems that editing the existing EBNF of OCaml and than translating it to Treesitter might be an option. 🤷🏻‍♂️

I dont know, how it compares to generating from Yacc and Lex 🙈

I also found a couple of other, very interesting projects, and they would help to generate an ANTLR file, that I strife to create for OneDev.

So if going the route from EBNF to Treesitter sounds acceptable, would this provide a path for both, Antlr and Treesitter.

P.S:

And if that all doesn't help, I also stumbled across a couple of articles, who might help to implement treesitter directly, and understand its format.

https://derek.stride.host/posts/comprehensive-introduction-to-tree-sitter

https://gist.github.com/Aerijo/df27228d70c633e088b0591b8857eeef

vzarytovskii commented 1 year ago

Ocaml syntax does not account for whitespace sensitivity (i.e. lexfilter), so won't be much helpful here unfortunately. I think, if we don't want to straight up generate it, but write a grammar manually first, we should be looking one for python.

ShalokShalom commented 1 year ago

Yeah, I actually considered another way now.

Going from .fsy to EBNF and then to Treesitter.

This doesn't involve OCaml at all. I will try to get this running soonish.

vzarytovskii commented 1 year ago

Yeah, I actually considered another way now.

Going from .fsy to EBNF and then to Treesitter.

This doesn't involve OCaml at all. I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

Nsidorenco commented 1 year ago

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

vzarytovskii commented 1 year ago

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

Nice

vzarytovskii commented 1 year ago

If anyone is interested I’ve been slowly working on a F# treesitter grammar that supports indentation-based scoping

I would like to help with testing and improving it. @Nsidorenco do you have any to-do things in mind (or are ones in README up to date)? I can start using it in my day-to-day work with compiler and maybe also start fixing things.

ShalokShalom commented 1 year ago

Yeah, I actually considered another way now. Going from .fsy to EBNF and then to Treesitter. This doesn't involve OCaml at all. I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

How is whitespace significance breaking either of the protocols?

Or do you think its lost in the translation?

vzarytovskii commented 1 year ago

Yeah, I actually considered another way now. Going from .fsy to EBNF and then to Treesitter. This doesn't involve OCaml at all. I will try to get this running soonish.

Fsy to ebnf won't likely work to, it won't be covering whitespace sensitivity

How is whitespace significance breaking either of the protocols?

Or do you think its lost in the translation?

Yeah, I think there's a possibility of losing a bunch of info during conversions. Besides fslexyacc alone doesn't carry the indent/whitespace info.

ShalokShalom commented 1 year ago

Yeah, I will see.

Considering Python is popular, do I guess this info is not being lost. The Yacc > EBNF converter is not updated since 2 years, the EBNF to Treesitter converter is very well maintained.

Besides fslexyacc alone doesn't carry the indent/whitespace info.

What else does?

Chet told me, the files are at the compiler repo:

https://github.com/dotnet/fsharp/blob/main/src/Compiler/pars.fsy https://github.com/dotnet/fsharp/blob/main/src/Compiler/pppars.fsy

Nsidorenco commented 1 year ago

@vzarytovskii any help is much welcomed. the README is relatively up-to-date. Off the top of my head the biggest remaining parts are 1) testing 2) improve the precedence of rules (to reduce parser size) 3) adding missing language features, like annotations 4) improve the external scanner to open a new ident scope on brackets and braces

vzarytovskii commented 1 year ago

What else does?

lexfilter in the repo

ShalokShalom commented 1 year ago

Yeah, I already found your previous comment on Discord about that, many thanks. I think Nsidorenco is already very far, so generating seems to serve no purpose at this point.

@Nsidorenco I am testing it with Helix, but I am unsure why it currently fails. So I cant provide you any meaningful feedback as of now, and hope I can do so in the future.

Thanks a lot for developing this, you`re great 🥳

vzarytovskii commented 1 year ago

The easiest way to be testing it, subjectively, is with nvim-treesitter and nvim-treesitter/playground, it has a great way of visualizing the tree (probably prim-types.fs is an overkill of a test, since FSharp.Core is a bit special):

image