Questions and feedback about extending the project

Hi. I'm a fan of semantic highlighting and since LSP started supporting it I have been amazed by how well some editors can colorize code.

Your project went really far with it, maybe even too far (in terms of combinations - the 336 colors). I would like to extend it and provide some feedback. Please note that I work almost exclusively with C++ and my thoughts are thus oriented for it. I'm also heavily used to Vibrant Ink theme so if possible, I would contribute another color pallete as I don't feel confident in extending existing color scheme. Your color scheme is original and I suspect most people would be used to themes based on already-popular palletes like Monokai. Thus, I propose to add an ability for multiple color schemes (and language-specific themes if that's possible).

Notice any contexts in which the existing syntax highlighting looks bad or incomplete?

In C++ ; is barely visible. Kinda annoying because this language is full of symbols. I assume the same happens for C because both are very often backed by the same tooling and there is only one LSP implementation.

Know any language with Semantic Highlighting that this theme could support better?

C++ is complex enough that it's LSP server (clangd) uses many non-standard names for token types and modifiers.

Here is a list from clangd-15.0.2 LSP initialize call:

types:
'variable', 'variable', 'parameter', 'function', 'method', 'function', 'property', 'variable', 'class', 'interface', 'enum', 'enumMember', 'type', 'type', 'unknown', 'namespace', 'typeParameter', 'concept', 'type', 'macro', 'comment'
modifiers:
'declaration', 'deprecated', 'deduced', 'readonly', 'static', 'abstract', 'virtual', 'dependentName', 'defaultLibrary', 'usedAsMutableReference', 'functionScope', 'classScope', 'fileScope', 'globalScope'

Some extra feedback (C++-centric):

Some of the token types/modifiers are unreachable (e.g. interface) because C++ doesn't differentiate between classes and interfaces - it allows to write both and everything in between because it puts practically no limits on multiple inheritance and where/when a function can be virtual and/or abstract. I can call such situations a "color waste" because the interface token type simply won't be reported and any color assigned to it is just wasted. Move the focus to function calls instead.
Some colors for some modifiers are unnecessary, e.g. defaultLibrary practically delivers no information because 1) C++ standard library is pretty small compared to other languages 2) The standard library is hardly ever imported to global scope which means all names from it will be prefixed with std::
IMO there is a significant opportunity to exploit custom token types/modifiers reported by clangd: usedAsMutableReference can be used to highlight out parameters and dependentName is a color-game-changer for templates.
IMO there are too many colors. Instead of using so many shades for different token modifiers, I would reserve a significant portion of them to style each variable in a different shade. Sadly VS Code does not support this feature, but I would reserve it for my own use - currently I'm experimenting with using clangd to highlight code I would put on my website, which has the potential to look far richer than any highligher for websites. *Ideally I would contribute my theme based on experiments with clangd-based-rendering of C++ code on my website - both are using LSP for highlight.*

Hello and thanks for the interest, I'd love to make the theme more compatible and include more styles in general.

C++ is one of those languages that I don't have personal experience with and wasn't even sure how to set up a working project or a language server to test the theme with, so your input here is already proving helpful.

I propose to add an ability for multiple color schemes

Multiple color themes are already supported, that's why themes in the generator/config.json file is an array. If you put JSON files for additional themes in the colors and themes folders (the file in colors defining the static editor colors while the the one in themes contains the VSCode theme data as well as the tokenColors and semanticTokenColors which will be merged with the generated output), you can simply add another theme definition pointing to those files into the config file together with your chosen color mixing presets and the generator will combine everything.
So it should be relatively easy to get started.

(and language-specific themes if that's possible).

Unfortunately this is not possible in VSCode yet. There is an extension that makes it work though.

In C++ ; is barely visible. Kinda annoying because this language is full of symbols.

The culprit here is the coloration for `punctuation.terminator.statement``. The faint semicolon was a consequence of the theme/the theme it was based on originally being specialized for JavaScript/TypeScript where semicolons are mostly unimportant and 90% of the time even optional. But given how many lanugages do take semicolons more seriously I don't have a problem with making it more noticeable.

C++ is complex enough that it's LSP server (clangd) uses many non-standard names for token types and modifiers.

I think I got clangd (mostly) running now and just from looking at a bunch of C++ Projects I pulled from Github I already see a whole bunch of tokens that are not or incorrectly higihlighted (at least if I'm interpreting their roles correctly) and that could be improved with that extra information. So upgrading the main theme to be more C++ friendly is definitely on my list now.

Some of the token types/modifiers are unreachable [...] I can call such situations a "color waste" because the interface token type simply won't be reported and any color assigned to it is just wasted.

I've been thinking about that problem for a while, but I'm still not sure how to handle it without sacrificing potential compatibility. If some language does not use some Token A at all and instead uses some other non-standard token B, then technically there's nothing speaking against remapping the color from A to B, but if there's some language out there that uses both token names, we end up with two different tokens with the same color (and semantic tokens, unlike TextMate cannot distinguish between different languages).

Some colors for some modifiers are unnecessary, e.g. defaultLibrary practically delivers no information because 1) C++ standard library is pretty small compared to other languages 2) The standard library is hardly ever imported to global scope which means all names from it will be prefixed with std::

Well, as long as it is used at all and correctly I usually don't see it as "waste". It's also worth noting that the defaultLibrary token does not only highlight the contents of the literal default library. Basically any functionality, classes, functions, types etc that are built-in (with the exception of actual primitives) generally get highlighted with this modifier even in the absence of an explicit standard library import... if the language server makes that distinction of course.
So my approach to solving this problem would be to seek to expand the application of the token, to other entities that would fit the built-in description that might not be highlighted as such right now.

IMO there is a significant opportunity to exploit custom token types/modifiers reported by clangd: usedAsMutableReference can be used to highlight out parameters and dependentName is a color-game-changer for templates.

I am not familiar with any of those terms, especially not in the C++ context, so I guess once we have a working example this would be easier to understand.

IMO there are too many colors. Instead of using so many shades for different token modifiers, I would reserve a significant portion of them to style each variable in a different shade.

For now the excess it kind of the point. Name based highlighting is absolutely a promising approach and it actually is possible in VSCode, however it requires completely hijacking the SemanticTokenProvider and for that you need something more powerful than a color theme; So while it's outside the scope of this particular project, there are already other extensions that go into that direction like Color Identifiers or ColorMate.

I think I got clangd (mostly) running now and just from looking at a bunch of C++ Projects I pulled from Github I already see a whole bunch of tokens that are not or incorrectly higihlighted (at least if I'm interpreting their roles correctly) and that could be improved with that extra information. So upgrading the main theme to be more C++ friendly is definitely on my list now.

Clangd uses many non-standard token type/modifier names because of the language complexity. Those that match standard names are there to give at least a base level of support for a generic theme which isn't focused on a specific language. Though there is 1 particularly unique case: the comment token type is not used to report comments but disabled (#ifdefed out) code instead.

I've been thinking about that problem for a while, but I'm still not sure how to handle it without sacrificing potential compatibility.

I think you don't have. The base level of compatibility is already defined by the standard set of token types and modifiers. What I think is to (maybe optionally) remove color combinations that just don't make sense, e.g. async namespace. Your demonstration website presents a large table of colors but I expect/predict many of them never happen in practice - while the image itself looks cool it delivers poor information; I would rather see a smaller table with color combinations that are actually used.

Plus I doubt someone would be in need of such detailed coloration. IMO declaration is just unnecessary as it's typically a single place in code and having clearly visible body of the class/function already indicates it's a declaration/definition, not usage.

remapping the color from A to B, but if there's some language out there that uses both token names, we end up with two different tokens with the same color

I think the set of standard names was designed in mind to also avoid this situation. A hypothetical LSP implementation for any language should prefer standard token names if they match well enough. For example: clangd uses typeParameter for both C++ template type parameters (TTP) and non-type template parameters (NTTP). C++ has significantly more complex generic programming features but the main goal of the token remains the same: code that is a (compile time) parameter of another code. clangd uses it for both because both land into template <> fragment of the source code and there is practically no benefit in differentiating them by color.

But overall I think this A-B problem is hard to fix. The main problem is that we can't stylize per language so if a given language uses both A and B at least one of them has to be non-standard: non-standard names are sorta language-specific as they are defined by the given language LSP. Therefore, I think the best approach would be to duplicate color uses only for tokens that are outside the standard set of names.

I usually don't see it as "waste". It's also worth noting that the defaultLibrary token does not only highlight the contents of the literal default library. Basically any functionality, classes, functions, types etc that are built-in (with the exception of actual primitives) generally get highlighted with this modifier even in the absence of an explicit standard library import... if the language server makes that distinction of course.

I have no significant experience with web-tech languages so can't really speak how the defaultLibrary coloring benefits there.

My C-C++-subjective opinion is that it makes very little sense to exist for these languages but because there is no support for language-specific coloring I obviously won't push for any change here. This has to remain as it is now: generic.

So my approach to solving this problem would be to seek to expand the application of the token, to other entities that would fit the built-in description that might not be highlighted as such right now.

If you would like further push "maximalist approach" then FYI: a significant partion of "baseline" C comes also from POSIX. ISO defines some, POSIX defines some and in case of some (e.g. malloc) they are defined by both standards and in case of some they are incompatibly defined by both (e.g. thrd_start_t). So in theory it could be beneficial to know if one is using POSIX or ISO definitions when writing code. Unfortunately, clangd has no posixLibrary or systemLibrary token mod so right now it remains only a very hypothetical feature.

IMO not worth the effort, even if LSP server did report such things but again - just my opinion. I just don't see a value for this specific token modifier. Same for builtin types: these are keywords and while some IDEs do color them differently, I prefer "keyword denoting a built-in type" to be colored just as "keyword".

I am not familiar with any of those terms, especially not in the C++ context, so I guess once we have a working example this would be easier to understand.

usedAsMutableReference is an official name defined by LSP for so-called "output parameters" - a situation where a function can modify supplied object and the caller can experience the change after the call. In case C++ it applies to parameters that are passed by any form of non-const reference (T&, T&& but not const T&). There are strong conventions both in C and C++ to handle out parameters but still it's beneficial to signalize where the result of a function is delivered through a parameter instead-of/in-addition-to typical return value.
dependentName - this is very C++ specific. Inside template code one can do anything with yet-unknown-type including stuff like taking a parameter of type T named t and doing t.foo() or accessing T::bar. foo and bar in such context are dependent names because they depend on the template parameter and the template code can mean something different for each different T. You can't colorize dependent-name-code as function/variable/constant/etc. because at the point of template definition it's unknown what it actually is. Only when the template is instantiated for specific T the compiler will know.

For now the excess it kind of the point.

Fine. Your project so you choose the direction.

Name based highlighting is absolutely a promising approach and it actually is possible in VSCode, however it requires completely hijacking the SemanticTokenProvider and for that you need something more powerful than a color theme; So while it's outside the scope of this particular project, there are already other extensions that go into that direction like Color Identifiers or ColorMate.

In theory, VS Code could deliver such functionality already by basing it on LSP. The references and similar LSP calls can inform the editor where a given object is being used.

Thanks for the response but overall, I think I can only provide more value once I push my own clangd-highlight-for-code-on-website project forward and come back with experience about specific colors and styles for clangd-custom token types/modifiers. Right now it's too early to present any theme forks/modifications. Feel free to ask me for more info/explanations though.

Overall I think this A-B problem is hard to fix. The main problem is that we can't stylize per language so if a given language uses both A and B at least one of them has to be non-standard: non-standard names are sorta language-specific as they are defined by the given language LSP. Therefore, I think the best approach would be to duplicate color uses only for tokens that are outside the standard set of names.

It just so happened that right after my last reply I re-read the documentation for the VSCode semantic tokens and, I must have either missed it or it was added within the last few months, the token provider does actually provide the language name and tokens can be targeted towards a specific language by appending its designation with a colon like variable.readonly:python. So yeah, it's totally possible to be specific you could even go as far as having a completely different style for each language within a single theme, not that this would be sensible, but it sure would fix that particular problem, again, once it's established which special features of which languages would benefit from more exhaustive special treatment.

Thanks for the response but overall, I think I can only provide more value once I push my own clangd-highlight-for-code-on-website project forward and come back with experience about specific colors and styles for clangd-custom token types/modifiers. Right now it's too early to present any theme forks/modifications. Feel free to ask me for more info/explanations though.

Sure I'm looking forward to that. In the mean time there are certainly enough possibilities with language specific highlighting in general to keep me busy.

IMO declaration is just unnecessary as it's typically a single place in code and having clearly visible body of the class/function already indicates it's a declaration/definition, not usage.

In that instance you are correct but again it's part of the general nature of modifiers that sometimes they are more useful in some contexts or language than in others.
Having a differently styled declaration is for example useful for variables in languages like Python and lua that simply initialize them by assignment without any keywords, so if you see a line like a = 5, you can't tell if this is the first declaration of a variable or if an existing variable is being reassigned. So in those cases the addiitonal hitn comes in very handy.

If you would like further push "maximalist approach" then FYI: a significant partion of "baseline" C comes also from POSIX. ISO defines some, POSIX defines some and in case of some (e.g. malloc) they are defined by both standards and in case of some they are incompatibly defined by both (e.g. thrd_start_t). So in theory it could be beneficial to know if one is using POSIX or ISO definitions when writing code. Unfortunately, clangd has no posixLibrary or systemLibrary token mod so right now it remains only a very hypothetical feature.

The server might not provide tokens for them but it seems the underlying textmate scope identifies various POSIX related features, which can then be mapped back onto the semantic token scopes. So I might check that out and potentially designate those as defaultLibrary.

I do business with her as well, but she has asked me to keep those dealings private for now...

I'm against "remapping solutions". I dislike tools that are trying to be too smart. Part of the tooling goals should be predictability and consistency. I'm already annoyed that some editors color NULL as a keyword even though it's a macro in both C and C++. Yes, it's frequently used and used like a keyword (which is so in many languages) but it's still technically incorrect to color something as keyword when it is not one.

So in short, I prefer a lack of feature than something wonky with false positives.

Thertzlor / semantic-rainbow

Questions and feedback about extending the project #3