eclipse-langium / langium

Next-gen language engineering / DSL framework
https://langium.org/
MIT License
663 stars 61 forks source link

Add the possibility of easily define custom Semantic Tokens #1515

Open matheuslenke opened 1 month ago

matheuslenke commented 1 month ago

I have a proposal for better define custom semantic tokens, as this was a feature that I needed for my language

Current Behavior

When using SemanticTokenAcceptor, I can only use the current defined semantic tokens. This happens because the highlightToken function maps the string token id to a int, and without the newer token in the SemanticTokenTypes list, it does not work.

Desired Behavior

As a User, I want to be able to easily define custom Semantic Tokens for my language using Langium methods.

Background

I am using Langium since its 0.1 and have a language project that was developed during a scientific initiation and my graduation project in Computer Science at University. Now, I am doing a M.Sc in Computer Science and I'm still developing my language called Tonto. I recently got my first paper accepted showcasing it! if you want to take a look, here is the website and here is the github project.

The use case that I needed was that the language needed to use different colors for different ontological categories, making it easier to visualize in a model: image

I can take care of this issue. Just reporting here to ask if anyone have any suggestions on how to implement it and if including this feature is approved.

msujew commented 1 month ago

Hey @matheuslenke, congrats on your paper!

Thanks for the proposal. However, I'm not quite sure how feasible this actually is; Not because it would make the feature/API more complicated, but because each language client only supports a very limited list of token types, see also the spec. While Langium could attempt to return more token types in its legend, the language client (I assume you're using vscode-languageclient) would need to know how to deal with them.

More specifically, the language client will send the language server a list of all supported semantic token types, which will then answer with its own supported semantic token types. The actually usable token types is the intersection of these lists. AFAIK there is no way for users of the vscode-languageclient library (or any language client for that matter) to add more semantic token types.

I could be wrong with this, and there might be an undocumented way of adding these token types to vscode/vscode-languageclient. Do you know more about this @matheuslenke?

matheuslenke commented 1 month ago

Hey @msujew , thank you!

I actually managed to do it already using Langium, however, I needed to redefine SemanticTokenTypes, SemanticTokenOptions and the highlightToken method. I also needed to add some elements to the package.json file of the extension. Most of my implementation is in this commit

I didn't know about those limitations, I think that this is not well documented at the library, however it worked when I tried. I've used some of this spec as well, as my extension provides a theme now. I can try to implement it directly in Langium and make a PR to make it more clear how I did it.

msujew commented 1 month ago

I didn't know about those limitations, I think that this is not well documented at the library, however it worked when I tried.

I see, it could always be that vscode just doesn't adhere to its own spec limitations, wouldn't be the first time :)

Either way, I think it's reasonable to support more semantic token types. I think the way to go would be to integrate the legend into the SemanticTokenProvider service API. Note that as multiple languages within the same language server can define their own SemanticTokenProvider, there needs to exist a way to merge all of their defined legends and propagate this merge back into the service so that it will generate the correct semantic token id. Feel free to contribute something in that direction and we can iterate on that if necessary.

Yokozuna59 commented 1 month ago

...but because each language client only supports a very limited list of token types, see also the spec.

@msujew I think this may help:

If necessary, extensions can declare new types and modifiers or create sub types of existing types through the semanticTokenTypes and semanticTokenModifiers contribution points in their extension's package.json...

from https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#custom-token-types-and-modifiers