Multicolor Semantic Highlighting with user-defined color palettes

codeinred commented 3 years ago

Hi everyone! Thank you so much for your work on clangd.

Would it be possible to implement multicolor semantic highlighting? This is one of the features I enjoyed most about ccls, and it would be amazing to have it as part of clangd! It's been a huge benefit to me, making it easier to read and understand code.

I've provided an outline of the feature, an example (this feature already exists in ccls), and I discuss concerns such as backwards compatibility (the feature is opt-in, so there should be no impact on anyone who doesn't explicitly enable it) and accessibility (the feature is opt-in, but many people find it helpful).

Please let me know if there is anything else I should provide.

Description

Semantic Highlighting wherein classes of tokens are assigned a color palette, rather than an individual color. So, for example, every instance of the same variable, type, template, or namespace has the same color, but instances of different variables, types, templates, or namespaces may have different colors from that palette.

Example

Below is an image illustrating multicolor semantic highlighting with semantic highlighting with ccls. Constants are colored shades of blue; functions shades of gold/brown; types are shades of red/purple. Namespaces are green, but noam is a different shade than json.

8791D7EA-79B5-4E39-BEB7-CEB09A26986B

Backwards compatibility

This feature should be opt-in via either a clangd configuration setting or an editor configuration setting, so there are no changes for users who don't opt into the feature.

Feature Reference

This feature was implemented in ccls, which uses llvm as a backend, just as clangd does. See the documentation for that here: https://github.com/MaskRay/ccls/wiki/Visual-Studio-Code#semantic-highlight

Choice of color palette

Clangd doesn't need to provide any built-in color palettes; rather, it'd just be nice if users had the capacity to specify their own palette within either their editor's settings, or the settings for clangd itself.

Below is an example of what this specification looks like for ccls in vscode. Colors are specified in hex values.

// Predefined by package.json
"ccls.highlight.function.colors": ["#e5b124", "#927754", "#eb992c", "#e2bf8f", "#d67c17", "#88651e", "#e4b953", "#a36526", "#b28927", "#d69855"],
"ccls.highlight.type.colors": ["#e1afc3", "#d533bb", "#9b677f", "#e350b6", "#a04360", "#dd82bc", "#de3864", "#ad3f87", "#dd7a90", "#e0438a"],
"ccls.highlight.variable.colors": ["#587d87", "#26cdca", "#397797", "#57c2cc", "#306b72", "#6cbcdf", "#368896", "#3ea0d2", "#48a5af", "#7ca6b7"],

"ccls.highlight.macro.colors": ["#e79528", "#c5373d", "#e8a272", "#d84f2b", "#a67245", "#e27a33", "#9b4a31", "#b66a1e", "#e27a71", "#cf6d49"],
"ccls.highlight.namespace.colors": ["#429921", "#58c1a4", "#5ec648", "#36815b", "#83c65d", "#417b2f", "#43cc71", "#7eb769", "#58bf89", "#3e9f4a"],

An analogous specification for clangd might look like this: ("ccls" has been replaced with "clangd")

// Predefined by package.json
"clangd.highlight.function.colors": ["#e5b124", "#927754", "#eb992c", "#e2bf8f", "#d67c17", "#88651e", "#e4b953", "#a36526", "#b28927", "#d69855"],
"clangd.highlight.type.colors": ["#e1afc3", "#d533bb", "#9b677f", "#e350b6", "#a04360", "#dd82bc", "#de3864", "#ad3f87", "#dd7a90", "#e0438a"],
"clangd.highlight.variable.colors": ["#587d87", "#26cdca", "#397797", "#57c2cc", "#306b72", "#6cbcdf", "#368896", "#3ea0d2", "#48a5af", "#7ca6b7"],

"clangd.highlight.macro.colors": ["#e79528", "#c5373d", "#e8a272", "#d84f2b", "#a67245", "#e27a33", "#9b4a31", "#b66a1e", "#e27a71", "#cf6d49"],
"clangd.highlight.namespace.colors": ["#429921", "#58c1a4", "#5ec648", "#36815b", "#83c65d", "#417b2f", "#43cc71", "#7eb769", "#58bf89", "#3e9f4a"],

Accessibility

Because the feature is opt-in, there is no impact on accessibility for users who prefer higher-contrast themes or who avoid color.

I believe that having this feature will also improve accessibility for people with ADHD or Dyslexia, and many people with these conditions report that it's been beneficial to them.

Unfortunately, research in this area is sparse, but I've collected references in relation to accessibility and semantic highlighting.

https://githubmemory.com/repo/CompEng0001/CodingandDyslexia (lists IDEs which support semantic highlighting)
Comment on this article: https://www.linusakesson.net/programming/syntaxhighlighting/

Anonymous Sun 21-Oct-2012 00:37 being a dyslexic syntax highlighting is one of the best tings that have happened to me, if anything i need way more
Comment by ZeroCool2u from this forum thread:

ZeroCool2u on March 24, 2019 [–] Something that also helps cut down on these types of bugs for me is Semantic Highlighting. I've gotten my whole team to use it. It's great, because you don't have to read the var name, you just have to see if there's any var in a function that doesn't have the same color as anything else and they tend to stick out like a sore thumb. Not sure if that's the best explanation, but if you're using any JetBrains IDE you can try it by going to Settings and searching for Semantic Proximity and hitting the check mark. Easier to show than explain.

Historical notes

I originally heard about this feature in this article by Evan Brooks.

HighCommander4 commented 3 years ago

One thing to note here is that CCLS uses a custom protocol between its client and server for semantic highlighting, whereas clangd uses semanticTokens which was standardized in Language Server Protocol version 3.16.

So, to support this feature in clangd, we would need to take one of the following approaches:

Build this functionality as an extension on top of semanticTokens. There are two sub-possibilities here:
- Do it purely on the client side, without any protocol changes. The client doesn't have access to semantic information, so it would need to treat all identifiers (with a given token kind) which are textually same as being the same (even if they are e.g. in two different scopes). This may not be a big deal.
- Extend the semanticTokens protocol with the necessary additional information. This seems tricky due to the fact that semanticTokens uses a packed binary representation over the wire to save space, so it's not just a matter of e.g. adding a new field which clients that don't support this feature can ignore.
Propose this feature to be part of the Language Server Protocol itself, and then clangd can implement the updated protocol.

codeinred commented 3 years ago

Hi @HighCommander4,

I really appreciate your response! I've been considering it. I would be happy with it being done purely client-side, and the first approach you suggested (treating identifiers with the same name as textually the same) would be fine.

Since clangd can find the definition of a variable or function, could this be used to distinguish between variables of the same name, enabling tokens to be identified uniquely for raindbow semantic highlighting? (It's definitely not a dealbreaker if this isn't possible due to performance or other considerations! Any degree of rainbow semantic highlighting is better than none!)

Alternatively, if you believe it'd be better to propose this feature as part of the Language Server Protocol, how would this be done? One idea would be to attach a unique tokenID as a 32 bit int that gets sent over the wire with each token as part of the LSP, and could then be used for colorization, but I don't know which approach would be best for this.

HighCommander4 commented 3 years ago

I would be happy with it being done purely client-side, and the first approach you suggested (treating identifiers with the same name as textually the same) would be fine.

I filed a vscode-clangd issue and wrote some thoughts about how this could work: https://github.com/clangd/vscode-clangd/issues/234

Since clangd can find the definition of a variable or function, could this be used to distinguish between variables of the same name, enabling tokens to be identified uniquely for raindbow semantic highlighting?

This is information available to the server, but not the client. So, if we are talking about a client-side only approach, then it doesn't help. (Unless the client is going to send a go-to-definition request for every token for this purpose, which I think you can see would have poor performance.)

Alternatively, if you believe it'd be better to propose this feature as part of the Language Server Protocol, how would this be done?

It looks like there is already an LSP issue for this: https://github.com/microsoft/language-server-protocol/issues/1051

codeinred commented 3 years ago

Thank you so much, I really appreciate it!

alexzielenski commented 2 years ago

To add some context to this I had sent a patch upstream for this sort of feature which worked within the LSP framework. While I was using clangd I maintained this patch over the years and it worked well. May need some massaging to bring it back up but rainbow highlighting was working as in ccls.

This patch used modifiers (1, 2, 3, 4, 5) for up to 5 different colors of each token type, so it is opt-in without protocol changes and works without any client-side modifications. The neat thing about it also is that if a color was left unspecified, the default color would be used.

https://reviews.llvm.org/D87669

Review stalled in 2020 and never made it in. Patch was dropped :( @sam-mccall

Xeverous commented 1 year ago

As someone who uses clangd for non-IDE purposes (generating highlighted code on a website) I think this feature is out of scope of clangd. IMO it should either be moved to LSP design/specification or just written outside clangd. You can compute color variations by issuing additional LSP calls that query object usages (so you can differentiate entities with the same name but in different scopes) - that's what I'm currently doing.

clangd / clangd