eatkins / tree-sitter-ebnf-generator

Convert an EBNF grammar to the tree-sitter dsl
GNU General Public License v3.0
24 stars 2 forks source link

🤯 This is fantastic #1

Open ckipp01 opened 3 years ago

ckipp01 commented 3 years ago

Hello! My apologies for submitting the issue for a non-issue, but I wanted to reach out about this project! I just randomly came across this and it sort of blew my mind a bit. I've recently been trying to actually complete the tree-sitter-scala grammar (slowly), and have been amassing a list of things that aren't covered yet as I'm trying to work on the highlights.scm primarily for syntax highlighting in Neovim, but also just to have a complete grammar in general. Poking around, I see that yours is already more complete, but also see that it's unable to actually produce a parser if I'm understanding the README correctly. I'm curious to pick your brain a bit on what your ultimate plan for this is, if you foresee being able to produce a valid tree-sitter grammar for this etc. I mainly curious to see if there are ways to reduce simultaneous work by me slowly trailing behind this project 😆 and trying to complete a tree-sitter-scala grammar when yours is already further ahead.

If you have the time, I'd love to just hear your thoughts. If no, I just wanted to say that this is a pretty brilliant approach. I'm also very curious if you've thought about how to potentially handle what will essentially be two different grammars for Scala 3, the indentation based syntax and the non.

Cheers

eatkins commented 3 years ago

Hey Chris! Thanks so much for the unexpected and kind feedback!

The readme was a bit out of date and indeed the transliterated spec grammar does compile to a tree-sitter parser now and also now supports all of the tree-sitter dsl features. Unfortunately, transliterating the scala spec does not work as well as I'd hoped it might. While it is able to parse a lot of code that the existing parser fails on, it takes ages for tree-sitter to compile it and the generated parser is noticeably slower to run then the existing parser in tree-sitter-scala. Not only that, but the syntax nodes are a bit insane in their verbosity. A conversation with Max Brunsfeld convinced me that a handwritten approach is the way to go.

I didn't want to reinvent the wheel since tree-sitter-scala already exists, but I'd gotten used to writing grammars with the new EBNF format so I wrote a node.js script to reverse the process and convert the tree-sitter grammar.js into the EBNF format. I introduced a new project handwritten scala that is effectively a fork of tree-sitter-scala. I translated the existing grammar.js to scala.ebnf with the script and have been experimenting with adding missing features to it. The EBNF format makes it a lot easier for me at least to see the discrepancies between the official spec and the current approximate grammar.

I am working with a startup, r2c, on integrating scala into an open source static analysis tool called Semgrep. Other languages have been integrated using tree-sitter but, as you know, the Scala grammar needs some love. I'd be happy to touch base about what we're doing and how we can potentially coordinate our work. Perhaps we can take the conversation off github? You can reach me via email at ethan at returntocorp.com.