Open Keats opened 2 years ago
I took a look at shiki and it looks pretty nice. My understanding is that it uses grammars similar to syntect, but in the case of shiki it is able to make it look exactly like vscode. I like shiki because it seems like it would be more flexible.
A Rust port of Shiki would be great!
FWIW – Pandoc uses the syntax highlighting library skylighting (written in Haskell). But I don’t know if it would be easier to port or not.
Two more popular libraries:
With shiki you get all the syntaxes/themes from VSCode for free, which is the main draw. Otherwise a port of something like pygments, prism, highlight.js would work but it's less interesting.
Shiki definitely looks like the best option if an effort to port it will happen!
There's some very promising work on improving tree-sitter start time: https://github.com/tree-sitter/tree-sitter/pull/2374
I took a look at Shiki to determine how much work a Rust port would be. It doesn't seem too hard, but there's one hitch: TM grammars use Oniguruma regex, and there's no Rust port of that either, just FFI bindings. Porting that would be much more difficult than porting Shiki, since Oniguruma is 85,000 lines of C vs Shiki's 7,000 lines of TS. The FFI bindings could work, but only if @Keats is okay with having Oniguruma be a build-time dependency statically linked into zola.
The above is all moot of course if someone can show the Oniguruma regex syntax to be close enough to regex
(or some other rust regex crate) that practically every TM syntax file out there would be supported.
That's what we do with syntect already, through https://crates.io/crates/onig
My bad, I didn't see it mentioned in the guide for installing from source.
An easy starting point for porting shiki seems to be handling TM grammars. I couldn't find a Rust implementation of a TM grammar deserializer, so I started one here: https://github.com/mwcz/textmate-grammar-rs I could use some help finishing it up. Or, if there is a crate out there that I missed, please correct me. :sweat_smile:
There's no textmate parser in Rust afaik, I had a look before :/ I did something slightly similar with a WIP pygments parser but didn't get very far.
In a world where loading tree-sitter is fast (< 50ms) and could be improved (eg a Zola user could list the language they use in Config.toml so we only load those) which one would we prefer between tree-sitter and shiki?
Advantages of tree-sitter:
Cons of tree-siter:
Pros of shiki-like:
Cons of shiki-like:
I am curious how long it would take to port shiki to rust... days? weeks? months? A rust port of shiki could be a really fun project if its not too large a project to get something up and running in a reasonable amount of time. (a couple hours to implement tree-sitter in Zola is certainly fast!)
The TextMate grammar parser is about all I have time for, but I can continue to improve it if someone else (@Jieiku??? :grin:) is interested in doing the rest of Shiki. I really don't want to clutter this tree-sitter issue with updates about TM grammars, so here's a last update, unless things do start moving strongly towards a shiki port.
potentially faster highlighting? (anyone has benchmarks between tree-sitter and syntect?)
At Sourcegraph, we've switched from syntect to tree-sitter for major languages because of performance. I did some benchmarking in Dec 2021, here's the performance report.
We haven't been able to drop syntect because of the long tail of languages not supported by tree-sitter.
Some differences between the Sourcegraph and Zola use cases:
For small snippets, the highlighting performance probably doesn't matter as much, and syntect's typical speed of about 50k SLOC/s per core should be good enough. That said some grammars like Scala and C# were, depending on the code, about an order of magnitude slower, and we'd not infrequently hit 10s timeouts.
Thanks for the perf report! It's really all about the startup time for us in practice so the PR linked above for tree-sitter (or similar) is a requirement to be usable. Not much activity on it sadly
https://github.com/tree-sitter/tree-sitter/pull/2594#issuecomment-1716623829 so it should come eventually!
I was today years old when I realised that even async fn
is not highlighted :(
Maybe at some point the build time hit become tolerable (even if tree-sitter doesn't land those caching PRs) just to have up-to-date parsers?
I was today years old when I realised that even
async fn
is not highlighted :(![]()
Maybe at some point the build time hit become tolerable (even if tree-sitter doesn't land those caching PRs) just to have up-to-date parsers?
This is because syntect doesn't support newer syntax files. I'm working on improving that so it might not be necessary in the future. No promises though, I'm strapped for time...
For me personally, I'd rather have a (even significant) performance hit, but better syntax highlighting. There is also the option to implement both, make syntect the default (for performance), and treesitter an optional alternative through a config option.
Current syntax highlighting is just a bit disappointing in most cases I have used so far.
If anyone wants an (admittedly jank) solution for the time being, .sublime-syntax
files are essentially just a YAML file with regex instructions inside and aren't all that hard to modify in-place. Zola doesn't really care if it matches whatever sublimetext actually wants/expects, so you can define new regex matches and/or apply whatever custom scopes you want. If you use the highlight_theme = css
config option, zola will automatically apply your scopes as css classes from the modified sublime-syntax file and then you can manually style those classes yourself. I'm not an expert at regex or css, nor have I ever used sublime text but I was able to get this working with a few hours effort.
Here's a .zip of the files i'm using for rust currently - the sublime-syntax file is based off of rust enhanced, styled to look like One Dark in VSCode. The modifications aren't pretty, but it does the job. Below is an example screenshot from my website:
Has anyone used it? The last time I looked at tree-sitter it didn't have many grammars but a quick look shows it's getting better. Our syntect syntaxes are stuck on old versions of the grammars because of new features in the Sublime grammar format not supported by Syntect. See https://github.com/nvim-treesitter/nvim-treesitter#supported-languages for a list of supported languages.
An alternative would be a basic textmate highlighter using VSCode syntaxes/themes since that's what everyone seems to be using these days.