Closed shaunlebron closed 5 years ago
As far as I know, GitHub doesn't support tree-sitter grammars. This is not something that depends on Linguist anyway, so you should probably mention it to GitHub support if you'd like them to support tree-sitter grammars.
Background: Atom uses tree-sitter since it is a fast way to use proper grammars in an editor, removing the need for hacky regexes.
Just an FYI: those "hacky regexes" are precisely the reason for the flexibility and power of TextMate-based grammars. π One can use them to write structured grammars a la tree-sitter
, or to highlight some ad-hoc format which lacks conventional or defined structure.
Having said that, supporting tree-sitter
grammars won't be as simple as flicking on a light switch, so to speak.
@Alhadis Oh, Atom supports tree-sitter? If it does, it might be in GitHub's plans to support it as well...
The Atom developers started the tree-sitter
project, so yes, it's only natural that Atom supports it. π
Ahah! @vmg might know if there's planned support for tree-sitter in GitHub's syntax highlighter then.
@Alhadis thanks for the note on "hacky regexes", I reworded it to remove the snarkiness since regexes have their place π
I also realized that whatever GitHub uses to do its syntax-highlighting is probably private? Linguist only identifies which external grammars to use, and the grammar repos have nothing to perform the actual highlighting as far as I know:
Linguist detects the language of a file but the actual syntax-highlighting is powered by a set of language grammars which are included in this project as a set of submodules as listed here.
GitHub already diffs syntax trees created by tree-sitter for displaying Pull Request toc's, but doesn't seem to be using them for syntax-highlighting.
since regexes have their place
It's actually more than just regular expressions. π TextMate's strongest feature is its unassuming simplicity, and the ease with which structured grammars can be built from composing groups of smaller expressions.
It's also cheap and fast to syntax highlight a flat file in a top-down pass, whereas Tree Sitter obviously has to parse and pull an entire AST into memory before it can highlight regions of source code. For an interactive text-editor, it makes senes⦠but for the millions of static files being viewed across GitHub, the added overhead is wasted.
@Alhadis thanks for extra context, I suppose server-side rendered files would make it a better fit
Thanks for the contribution @shaunlebron! We've been exploring using Tree Sitter for syntax highlighting on the website, but there are many technical challenges to overcome. We'll keep y'all posted.
Thanks @vmg for the info.!
I think we should close this in the meantime. As long as the backend doesn't support Tree Sitter, there is nothing we can do on the Linguist side.
I'm not sure if GitHub is using tree-sitter for syntax-highlighting, but I saw in #4013 that the grammars are not supported in some way.
I created a syntax-highlighter using tree-sitter for my own purposes, and thought it might be helpful to share here: https://github.com/shaunlebron/highlight-tree-sitter