The future of syntax highlighting

muirdm commented 2 years ago

As I hack on go-mode's fontification to support generics, I also am looking at other options.

To summarize how go-mode currently fontifies, it uses a combination of regexes and structural understanding of paren/bracket pairs. We make use of more advanced font-lock facilities such as function matchers and anchored matchers. These are relatively slow, but jit font lock mode does a really good job of only fontifying parts of the file that change, so performance is fine.

Tree-sitter is hot, but unfortunately also only has syntactic information, so does not help in ambiguous cases such as foo[bar](). It also may be more sensitive to syntax errors than go-mode's approach, but on the other hand is almost certainly faster. I think we should consider adopting tree sitter if/when it is part of core emacs, but before that the benefits don't seem that great. (It may have other benefits beyond syntax highlighting, though.)

The other option is LSP's semantic tokens. gopls does support semantic tokens and I was able to get it working with lsp-mode after fixing some things in gopls. In general it works, and of course it has full type information so all our ambiguity problems are solved. The fortification via lsp-mode is asynchronous, so it doesn't cause any lag, although that means it doesn't pop in immediately as you type. You can configure the idle delay before it fontifies. Setting it to 100ms gives a pretty good experience (although if you are working in a package that takes longer than 100ms to type check, the semantic tokens will be slower as well). One of the main problems is syntax errors. Without proper AST and type info, semantic tokens fall apart. To address this, I think we would need better support for partial fontification where gopls and/or lsp-mode know to keep fontification around for parts of the file that can't currently be type checked. This may be easier said than done.

My general idea is to continue to maintain basic fontification in go-mode, but support optionally augmenting it with gopls semantic tokens to fill in the holes. Once that is working well, we can consider completely offloading syntax highlighting to gopls. Thoughts?

dominikh commented 2 years ago

My opinions on the matter:

At its core, it's syntax highlighting. We've spent too much time and code on trying to guess type information. This never worked 100%, and generics make it worse. I'd be more than happy to limit go-mode's native support to actual syntax.

I'd also like to switch to tree-sitter once that's been in core for a couple releases. The core of our syntax highlighting is based on regular expressions, which is of course the wrong tool for Go. We have some custom parsing routines, which add complexity to our code and still aren't perfect. I was also planning on switching to tree-sitter for most custom movement commands. We'll of course have to see how tree-sitter handles invalid code, but from quick testing I've done, it seems to perform fine.

I'm fine with relying on LSP for adding type information to our highlighting. However, I wouldn't be fine with relying on LSP for all syntax highlighting. Having a noticeable delay between typing and any kind of syntax highlighting is IMO a no-go. Having a delay for type information, OTOH, is fine.

To summarize, my end-goal would be:

tree-sitter for syntax highlighting
LSP for semantic highlighting

In the interim, I would be fine with:

simplifying our syntax highlighting to remove the bits that pertain to type information
using LSP for semantic highlighting

muirdm commented 2 years ago

Thanks for that. It sounds like our views mostly align. I suggest we do at least 4baab54c from my branch since that fixes existing fontification.

In the meantime I will mail some gopls fixes and work on getting its semantic token support out of experimental.

dominikh commented 2 years ago

Note, I know next to nothing about semantic tokens in LSP, but the comment in https://github.com/golang/go/issues/45313#issuecomment-1087664169 suggests that it doesn't need type information, which is either wrong, or means semantic tokens are less powerful than we need them to be?

pjweinb commented 2 years ago

The existing implementation of semantic tokens in gopls uses type information when that is available. One might be able to get by solely using the ast, but the code would have to be rewritten. Or, by providing less information, which seems to defeat the point.

dominikh commented 2 years ago

Thank you for the clarification.

the42 commented 1 year ago

As tree-sitter is merged into 29 that's the way to go forward

dominikh / go-mode.el

The future of syntax highlighting #401