danderson / templ-ts-mode

Emacs major mode for editing Templ files
GNU General Public License v3.0
13 stars 2 forks source link

Having no javascript in a file breaks Go font locking #5

Open danderson opened 10 months ago

danderson commented 10 months ago

The way multiple language tree-sitter support works is that templ-ts-mode provides a set of rules to define what parts of the file are Javascript. As the file changes, Emacs' tree-sitter machinery applies those rules and calls treesit-update-ranges to tell tree-sitter what to parse. This then affects syntax queries by the font-locking engine, and so file segments delegated to Javascript get Javascript font-locking.

However, when the subparser range rules say that there is no Javascript in the buffer, the machinery calls (treesit-update-ranges javascript-parser nil), which sounds like "there are no ranges" but actually means "parse the entire file". And so now, we have two parsers that are running unconstrained over the buffer, fighting over font-lock rules.

The weird way in which this can manifest is:

foo2

Notice that interface is mis-highlighted, just the in gets highlighted. As far as I can tell, this is because the Javascript parsers happens to be able to parse this well enough to go "ah, in, that's a keyword!" and then highlights it as a Javascript keyword. This mis-highlighting is very stable across edits to the file... until I add a script foo() { ... } function to the templ file, at which point the range rules limit Javascript to that subset of the file, and Go can once again assert control over the keyword interface in the source.

As far as I can tell, this is just an emacs bug in treesitter's multiple language support. When no ranges match for sub-parser, it should be disabling the parser, or setting some dummy range like (0 . 0) to keep it out of the file. Instead, a lack of a match unshackles the parser and lets it run wild over unrelated source code.

I'm not quite sure how to fix this too well. There is a hacky fix, which is to abandon the rules-based subparser stuff. I can instead provide a function that computes subranges and sets them manually. I tried that before in prototyping, and it was mostly fine. The caveat is that every invocation really needs to set the subparser ranges on the entire templ file every time, rather than being able to incrementally update. This is supported by the API contract, but is less performant than updating just the range that emacs tells us to. On the other hand, if I do the work myself I can pass a dummy (0 . 0) range to javascript when none is present, and fix this issue. Hopefully a future release of Emacs addresses the underlying problem...

danderson commented 10 months ago

I tried implementing the subranges by function thing I mentioned above... And hilariously, that breaks things even worse. Now when no Javascript is present in the file, font-locking gets entirely nuked from orbit for the entire file, including for all the templ bits. I'm guessing treesit very dislikes the idea of being told that the parser applies to a range of zero characters, and this somehow breaks font locking. I can't say I understand why, it's bemusing.

danderson commented 10 months ago

Attempted something more brute-force, deleting the javascript parser in the range update function. As expected, this naive approach doesn't work because the font-lock and indent rules still reference the javascript parser, and so it gets recreated immediately by one or the other and the font-locking issues return.

So, I can't set the parser's range to nil (that means whole-file and messes everything up), I can't set it to ((0 . 0)) (this somehow breaks font-locking entirely), and I can't naively delete the javascript parser when not needed (it just gets rebuilt on-demand).

Remaining options seem to be: get to the bottom of why ((0 . 0)) as a range doesn't work and fix that, set a dummy 1-byte range for javascript and wait for a future emacs that fixes all this, or manually and carefully delete the javascript parser and all its uses in the mode every time there is no more javascript in the buffer. This... should be doable, I think, it's just a lot more work than the simple declarative things were suggesting.

danderson commented 10 months ago

Hot damn, hold the phone! I forgot that treesit parser ranges are 1-indexed, not 0-indexed! Restricting the javascript parser to ranges ((1 . 1)) fixes font-locking and doesn't generate any other apparent spurious stuff.