Open StachuDotNet opened 8 months ago
Copying this from some thoughts I posted on Discord recently:
tl;dr: is tree-sitter really the best tool for our parser, or should we reconsider writing a parser combinator thing in Darklang?
The way we're currently set up for the new/tree-sitter parser is: A. write Darklang source code B. use
tree-sitter
andtree-sitter-darklang
to parse to tree-sitter's internal representation of the syntax tree C. map that to a Dark type "ParsedNode," via a built-in function (the type: https://github.com/darklang/dark/blob/a68b808eb35d671e3921ce30ca357a67e166a995/packages/darklang/languageTools/parser.dark#L11-L27; the builtin fn: https://github.com/darklang/dark/blob/a68b808eb35d671e3921ce30ca357a67e166a995/backend/src/BuiltinExecution/Libs/Parser.fs#L37) D. map ParsedNode to WrittenTypesthose WrittenTypes are used:
- to map to ProgramTypes, where relevant
- to map to semantic tokens, for VS Code syntax highlighting
I've been questioning whether depending on tree-sitter for all of our parsing is a good idea.
An alternative would be that we write the parser in Darklang instead, potentially as wrapper/equivalent to Farkle or FParsec, via minimal Builtins. (relevant links:
- https://github.com/stephan-tolksdorf/fparsec
- https://teo-tsirpanis.github.io/Farkle (seems to be better for us than FParsec, per https://teo-tsirpanis.github.io/Farkle/choosing-a-parser.html)
- https://www.youtube.com/watch?v=RDalzi7mhdY not expecting you to watch this, but a good talk on the subject.)
Here are some potential trade-offs to consider:
- the current A->B step:
- requires us to build tree-sitter as an .so, as well as our grammar's .so. This is all set up now, but takes a few seconds of time, esp CI time.
- requires our cli app to be ~1MB larger, to package those
.so
s along with our exe- requires a fancy extract-and-load setup to use both of those at run-time ()
- the current C->D step:
- is pretty complicated, and involves some fragile code. there might be abstractions available here we haven't yet discovered, but it's a bit rough.
- see https://github.com/darklang/dark/blob/main/packages/darklang/languageTools/parser.dark
- we're broadly missing out on immediate feedback, throughout the process. We wait for the parser to be built, and have to follow each of those changes with ParsedNode-> WrittenTypes functions. And every grammar upgrade depends on a full build/release cycle, waiting for CI etc, to get things to users
- I've no clear path forward on versioning the parser with our langauge, in a reasonably seamless way. as opposed to an in-Dark solution that would allow us to properly version the parser fns like anything else in the package manager.
- our current setup provides only one big parser for a 'file', but what if we want to allow/disallow different parseable things if we're parsing a Canvas, vs parsing a Script, etc. I've been hoping we'd figure out a proper solution for that eventually, but everything I've come up with so far feels like a hack (i.e. passing a 'header' to the tree-sitter grammar where we). I think the composability of a parser combinator would prepare us for these scenarios much better.
- broadly, it feels like we're doing (more than) double-work: we're writing the grammar.js, which builds into a parser, and writing a bunch of "parser.dark code" to map that back to WrittenTypes.
I suspect we'd still need a tree-sitter parser around, for highlighting and such in contexts outside of our VS Code plugin.
Am I forgetting a bit reason why we chose tree-sitter rather than exploring writing a parser in Dark/F#? Or maybe we've just learned more since and it makes sense to reconsider? Maybe we're making ParsedNode -> WrittenTypes more complicated than it needs to be?
Paul's response:
As I recall, the reasons to use tree sitter:
- performance
- ability to adapt to use in existing syntax highlighting frameworks and therefore reuse the definition
I would add that parser combinator frameworks are, afaik, possibly not powerful enough for real programming languages. But I could be wrong on that note
I don't think there's anything to do here, and we're close to a successful use of tree-sitter such that we'll be able to abandon our old F#-based parser, but I think it's worth reflecting here more, if we're doing the right thing fundamentally.
This Issue exists to collect many items that relate to Dark's parser(s), pretty-printer(s), name resolution, etc.
Here's our current state:
dark-classic
, we didn't have a parser used for user codeThese are tasks currently available to be worked on:
tree-sitter-darklang/test/corpus
, and fail in CI upon seeing unformatted tree-sitter test files (note: this task is probably the lowest-hanging-fruit here, with no blockers)tree-sitter
nodes (at time of writing,parser.dark
)Once the tree-sitter grammar and parser has 'caught up' with our full language:
Once that is done, we can tackle the fun stuff:
!
?
to language, to assist with ergonomic error-handling@paul.module1.module2
-like syntax, rather thanPACKAGE.Paul.Module1.Module2
List
typeAll of these tasks are worth some discussion, either here or in Discord, before starting.