crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.37k stars 1.62k forks source link

Grammar specification #11853

Open spindlebink opened 2 years ago

spindlebink commented 2 years ago

I've been working at a tree-sitter for Crystal, and it's a challenge, largely because Crystal doesn't to my knowledge have a formal or even semi-formal grammar specification. Implementing language tooling (LSPs, parsers, tree-sitters as in this case) is therefore significantly more difficult and looser than it needs to be. Although the syntax is usually easy to intuit while writing code, there are edge cases that may be well-defined in the compiler but require trial and error for an outsider to figure out.

For example, regarding unparenthesized argument lists (Ruby "commands," I think)--it's unclear where newlines need escaping and where they don't (comma+newline is fine, newline+comma is not--it makes intuitive sense, I think, but I have no way of verifying if there's something I'm missing). And on literals, the docs say an integer literal is an optional + or - sign, followed by a sequence of digits and underscores, optionally followed by a suffix, but that description is vague enough to allow for (invalid) 02 or +1___5. A similar issue arises for the float description. There's also the issue of local variables versus self method calls--I think the rule is "if there are arguments, it's a method call, otherwise it needs self to be a method call," but such a rule isn't (unless I've missed it) formally stated.

It's not these specific cases that're my hangups, but the lack of a formal specification/source of truth for answering these questions and further ones that arise. Handmade parsers are awesome, but because Crystal's compiler uses one, we don't have a grammar to reference when writing tooling.

An official grammar in EBNF or PEG format (or even a semiformal grammar or a step toward one) would be a boon to anyone trying to tie into the language, hence further developing Crystal's approachability and editor tooling support. Not to mention, further development of the compiler's parser will likely be easier with a formal specification.

Fryguy commented 2 years ago

@spindlebink it's not the formal spec you are looking for, but have you seen https://github.com/crystal-lang-tools/language-crystal ? This has cson format grammars that are used by linguist, and so used by github for language detection. For syntax highlighting GitHub uses tree-sitter, and then if the language is not found there it falls back to a tool called PrettyLights, which uses those linguist grammars [ref].

spindlebink commented 2 years ago

Yep, I know there's decent support in TextMate-style regex grammars--it's writing an actual parser without a specification that's my use case. The editor I'm trying to use is still very much in development and exclusively uses tree-sitter syntax trees. They're also not to my knowledge planning on adding typical regex grammar support.

straight-shoota commented 2 years ago

Related forum discussion: https://forum.crystal-lang.org/t/tree-sitter-crystal/3565

nobodywasishere commented 6 months ago

I have the start of a grammar here. It's still very much WIP but I think it's a good start, based on how the parser itself operates. Stuff like macros and heredocs will probably not be able to be expressed via it though.