dotnet / vscode-csharp

Official C# support for Visual Studio Code
MIT License
2.87k stars 675 forks source link

Many issues with token classification for syntactic and semantic highlighting #5587

Closed akbyrd closed 3 days ago

akbyrd commented 1 year ago

Issue Description

The C# extension does not provide semantic token information in a variety of circumstances. This prevents themes from implementing accurate and precise semantic highlighting.

Issue 1 - No textmate scope exists for the value keyword in properties, events, or indexers.

image

Typically semantic token types are mapped to appropriate textmate scopes. plainKeyword is not mapped to any textmate scopes. And doing so would substantially harm the ability to theme C# in vscode. Most tokens have more specific textmate scopes already. The problem here is that plainKeyword is so broad that it would actually remove useful information if it were mapped to a textmate scope.

Desired Behavior value is mapped to a more specific and useful semantic type like keyword.value or keyword.value.property. Then that should be mapped to an appropriate textmate scope, probably with the same name in this case.

Issue 2 - Inaccurate textmate scope for notnull, unmanaged, and default generic type constraints. They are classified as entity.name.type.cs. Other generic type constraints are able to piggy back off of their normal classification of such as keyword.other.struct.cs.

image

Desired Behavior notnull, unmanaged, and default are classified as keyword.other.*.cs or another appropriate, specific scope. default in this case should not be mapped to keyword.control.default.cs to allow it to be distinguished from other uses of default.

Issue 3 - Semantic token type string breaks ability to highlight escape sequences.

image

In this case, the extension is categorizing the entire string as just string, then mapping that to a textmate scope. But the textmate grammar already did a better job of categorizing the string. It correctly noticed escape sequences and categorized them as constant.character.escape.cs. Because of the extension it's not possible to highlight escape sequences differently from the rest of the string.

Desired Behavior The extension does not override string tokens with less precise scopes so that escape sequences may be highlighted.

image

Issue 4 - No textmate scope exists for formatting specifiers in strings.

image

This is similar to the above issue, except there's no existing textmate scope that can be used. Instead the plugin needs to emit a semantic token type, such as string.placeholer and then map that to an appropriate textmate scope such as constant.other.placeholder (this is what the C++ grammar uses).

Desired Behavior The ability to highlight format placeholders differently that the rest of the string.

image

Issue 5 - No textmate scope exists for scoped, sizeof, fixed, with, unsafe, delegate, managed, unmanaged, and, null, file, or global.

image image

Issue 6 - Different uses of using cannot be distinguished

Both using statements and using directives receive the same textmate scope: keyword.other.using.cs. This makes it impossible to highlight the different semantic meanings.

Issue 7 - Types in await statements incorrectly classified as variables

image

Issue 8 - Calling conventions in function pointers incorrectly classified as variables (Cdecl, Stdcall, Fastcall, Thiscall).

image

Issue 9 - Defined preprocessor symbols incorrectly classified as variables.

image

Issue 10 - Label names incorrectly classified as variables.

Only happens when the label doesn't exist. But it's correct to assume it's a label name when used in a goto statement (C++ does this, for example). The textmate grammar gets this right, but the extension overrules it.

image

This only happens once the file is saved. It does not happen in an untitled editor set to the C# language.

Issue 11 - global incorrect classified as entity.name.label.cs.

image

Issue 12 - Semantic token type namespace is not mapped to a textmate scope

image

Additional Thoughts There's a general pattern here where applying extremely broad and generic semantic types to tokens is actually washing out useful information. The mapping from semantic token types to syntactic textmate scopes takes priority over the scopes generated by the textmate grammar. Once the extension decides to override the textmate scopes with something less precise there is no way to recover that information and control over highlighting is lost.

I think there's probably a change in philosophy warranted: semantic types should not be assigned unless they at least as precise as the textmate grammar.

This wouldn't be a problem if semantic scopes and textmate scopes were additive, but that's not the case in vscode. Mapping semantic scopes to textmate scopes is a destructive operation.

Environment information

VSCode version: 1.75.1 C# Extension: 1.25.4

Dotnet Information .NET SDK: Version: 7.0.100 Commit: e12b7af219 Runtime Environment: OS Name: Windows OS Version: 10.0.19045 OS Platform: Windows RID: win10-x64 Base Path: C:\Program Files\dotnet\sdk\7.0.100\ Host: Version: 7.0.0 Architecture: x64 Commit: d099f075e4 .NET SDKs installed: 7.0.100 [C:\Program Files\dotnet\sdk] .NET runtimes installed: Microsoft.AspNetCore.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.AspNetCore.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App] Microsoft.NETCore.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 5.0.14 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.10 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.NETCore.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App] Microsoft.WindowsDesktop.App 3.1.31 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 5.0.14 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.10 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 6.0.11 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Microsoft.WindowsDesktop.App 7.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App] Other architectures found: x86 [C:\Program Files (x86)\dotnet] registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation] Environment variables: Not set global.json file: Not found Learn more: https://aka.ms/dotnet/info Download .NET: https://aka.ms/dotnet/download
Visual Studio Code Extensions |Extension|Author|Version| |---|---|---| |akbyrd-vsc-extension|akbyrd|0.2.1| |akbyrd-vsc-theme|akbyrd|0.6.0| |auto-align|bladnman|0.0.13| |code-settings-sync|Shan|3.4.3| |cpptools|ms-vscode|1.14.3| |csharp|ms-dotnettools|1.25.4| |EditorConfig|EditorConfig|0.16.4| |multi-command|ryuta46|1.6.0| |perforce|mjcrouch|4.15.5| |rewrap|stkb|1.16.3| |vsc-material-theme-icons|equinusocio|2.5.0| |vscode-gcode-syntax|appliedengdesign|0.7.7| |vscode-todo-highlight|wayou|1.0.5|;
akbyrd commented 1 year ago

For additional context here are my goals with theming:

I've been able to successfully do this in 3 other languages. This is the first one that has given me trouble. The biggest issue is that plainKeyword is applied so heavily that I can't use that semantic token type. Some keywords are also types (int, object, var), some are also literals (null, false), and some are variables (args, this).

If I assign a color to plainKeyword it obliterates my ability to highlight a large swath of things more precisely.

wise0704 commented 1 year ago

Related: #5993

Basically for now, imo, the solution for now is to let textmate scopes handle keywords and not assign anything for plainKeyword.

So that, issue "keyword X is has no/wrong scope" -> textmate fix in csharp-tmLanguage

Issue 1 - textmate fix (disable plainKeyword = [ keyword.cs ] fallback) Issue 2 - textmate fix (disable plainKeyword = [ keyword.cs ] fallback) Issue 3 - seems already fixed with stringEscapeCharacter semantic token Issue 4 - disabling semantic token fallback string = [ string ] and letting textmate handle it Issue 5 - textmate fix (disable plainKeyword = [ keyword.cs ] fallback) Issue 6 - textmate fix (disable plainKeyword = [ keyword.cs ] fallback) Issue 7 - seems like a bug in semantic Issue 8 - also seems like a bug in semantic Issue 9 - perhaps also disable fallback variable = [ variable.other.readwrite ]? variables are detected quite well with textmate anyway Issue 10 - same as issue 9 Issue 11 - textmate fix (disable plainKeyword = [ keyword.cs ] fallback) Issue 12 - seems fixed in semantic

akbyrd commented 1 year ago

I can confirm Issue 3 is fixed. Issue 12 is not fixed and is in fact worse now. It's not classified as a namespace at all, but rather a variable.

image

@dibarbet In light of the recent changes to semantic classification that have made these problems much worse, I'd love to know if this issue is realistically going to be addressed any time soon or whether I should give up on the idea of supporting C# in my theme. I'm not being hyperbolic when I say semantic highlighting is quite literally impossible to do well using this extension when trying to achieve these goals: https://github.com/dotnet/vscode-csharp/issues/5587#issuecomment-1437943751

dibarbet commented 1 year ago

@akbyrd apologies this is the first time I'm seeing this issue (the label changes under my name were done by an automatic tool).

So there are a couple of notes I'd like to make.

  1. In the 2.x version of the extension we've switched over the default server implementation to be backed by Roslyn LSP instead of O#. O# used a custom protocol that was converted to semantic tokens, and there may have been issues there.
  2. Additionally, we've adjusted which semantic token types are reported and how they map to textmate scopes, so there definitely more changes there.

So to start with, lets go through the original set of issues to see what the behavior is in 2.x. We can then use that as the starting point to decide if there are changes we want to make.

  1. value is now mapped to the keyword semantic token type. We do not currently implement custom semantic tokens types for more detailed keyword information.

    image
  2. Similar to above, this is classified with the keyword semantic token type. We don't currently implement custom semantic token types for more detailed scopes.

image
  1. This is now classified with a custom semantic token type, stringEscapeCharacter, which maps to constant.character.escape.cs

    image
  2. As written, the {0} in your string isn't a placeholder, it is just the literal {0}. It would be a placeholder in a call to string.Format(<string>, value), but we currently do not support classifying that case differently. You could also put a $ in front of the string to turn it into an interpolated string, which we should be classifying correctly

    image
  3. These are all currently classified as keyword. As before we do not currently have support for further semantic sub classification.

    image
  4. Same as above.

  5. This is correctly classified for me. Is your project loaded correctly?

    image
  6. What semantic token type are you expecting here?

  7. This looks correct to me - what semantic token type are you expecting here?

  8. This is fairly unlikely to change - it might be possible to infer but it isn't a common enough case that would justify it.

  9. This is now keyword

    image
  10. I think this is working, though note that we don't define this mapping (we return the namespace token type and vscode defines the fallback).

    image

The mapping from semantic token types to syntactic textmate scopes takes priority over the scopes generated by the textmate grammar. Once the extension decides to override the textmate scopes with something less precise there is no way to recover that information and control over highlighting is lost.

I think there's probably a change in philosophy warranted: semantic types should not be assigned unless they at least as precise as the textmate grammar.

This is not going to happen. Textmate cannot ever override semantic classifications - a regex based classifier will never correctly be able to 'parse' C#.

The only path forward here is for the C# server to return more specific semantic classifications. While I'm open to considering that, it also isn't the simplest of tasks, and we haven't really had many requests for it (we use the same classifications in VS for example, so it doesn't offer more customization there either)

wise0704 commented 1 year ago

@dibarbet The problem here is that the semantic token is grouping everything into one textmate scope.

The problem is even bigger now in the 2.x version. Previously, in 1.x, every keyword was assigned to plainKeyword = [ keyword.cs ]. But now, in 2.x, every keyword is assigned to keyword = [ keyword.cs, keyword.control ].

Removing themes for the broad scope keyword was possible to allow using much more specific textmate scopes for themes, but now since VSCode hardcodes keyword.control into the scope, it's not possible to unassign the scope, and it's no longer possible to not group every single keyword from having the same colour.

The textmate rules for C# have been improving a lot in https://github.com/dotnet/csharp-tmLanguage. For keywords, many languages don't rely on semantic highlighting, since regular expressions are usually enough to detect them.

The semantic token names at least should be changed to something other than keyword (e.g. roll back to plainKeyword) so themes can "opt out" of using one colour for all keywords.

Preferably,

  1. It should be assigned to a semantic token name not built into VSCode. See: source
  2. It should not have a fallback textmate scope (keyword.cs) assigned. Themes should instead use the semantic token colourisation configuration to use semantic highlighting. plainKeyword: #xxxxxx is all that's needed.
akbyrd commented 1 year ago

@dibarbet Thanks for the thorough look! Indeed lots of improvements have been made to the extension.

Issue 1 - No textmate scope exists for the value keyword in properties, events, or indexers.

value is now mapped to the keyword semantic token type. We do not currently implement custom semantic tokens types for more detailed keyword information.

Confirmed it's set to keyword. The textmate scope has also been improved to variable.other.value.cs. The semantic token is now preventing the more accurate textmate scope from working.

Issue 2 - Inaccurate textmate scope for notnull, unmanaged, and default generic type constraints.

Similar to above, this is classified with the keyword semantic token type. We don't currently implement custom semantic token types for more detailed scopes.

Confirmed.

Issue 3 - Semantic token type string breaks ability to highlight escape sequences.

This is now classified with a custom semantic token type, stringEscapeCharacter, which maps to constant.character.escape.cs

Confirmed.

Issue 4 - No textmate scope exists for formatting specifiers in strings.

As written, the {0} in your string isn't a placeholder, it is just the literal {0}. It would be a placeholder in a call to string.Format(, value), but we currently do not support classifying that case differently. You could also put a $ in front of the string to turn it into an interpolated string, which we should be classifying correctly

Textmate scopes can be anything you wish. Start with one of the existing categories, and append as much useful information as you have to it. The C++ extension uses constant.other.placeholder, which seems very reasonable. One Python extension uses constant.character.format.placeholder.other.python.

TextMate is free-form in the sense that you can assign basically any name you wish to any part of the document that you can markup with the grammar system and then use that name in scope selectors.

Issue 5 - No textmate scope exists for scoped, sizeof, fixed, with, unsafe, delegate, managed, unmanaged, and, null, file, or global.

These are all currently classified as keyword. As before we do not currently have support for further semantic sub classification.

scoped - Confirmed sizeof - Confirmed, but worse than textmate scope keyword.operator.expression.sizeof.cs fixed - Confirmed, but worse than textmate scope keyword.control.context.fixed.cs with - Confirmed, but worse than textmate scope keyword.operator.expression.with.cs unsafe - Confirmed, but worse than textmate scope keyword.control.context.unsafe.cs delegate - Confirmed, but worse than textmate scope in some cases and better in others managed - Confirmed unmanaged - Confirmed and - Confirmed, but worse than textmate scope keyword.operator.expression.pattern.combinator.and.cs null - Confirmed, but worse than textmate scope constant.language.null.cs file - Confirmed, but worse than textmate scope storage.modifier.file.cs global - Confirmed

Issue 6 - Different uses of using cannot be distinguished

Same as above.

using directive - Confirmed, but worse than keyword.other.directive.using.cs using statement - Confirmed, but worse than keyword.control.context.using.cs

Issue 7 - Types in await statements incorrectly classified as variables

This is correctly classified for me. Is your project loaded correctly?

Confirmed with a project, worse than entity.name.type.cs without one.

Issue 8 - Calling conventions in function pointers incorrectly classified as variables (Cdecl, Stdcall, Fastcall, Thiscall).

What semantic token type are you expecting here?

Anything reasonable and accurate. It's not clear to me whether the calling convention for a function pointer maps to System.Runtime.CompilerServices.CallConvCdecl, System.Runtime.InteropServices.CallingConvention.Cdecl, or is a unique entity. In the first two cases it would probably match the fully qualified thing: entity.name.type.class or variable.other.enummember.

If it's a unique thing then I can see arguments for a couple different options using the textmate docs as a guide

Issue 9 - Defined preprocessor symbols incorrectly classified as variables.

This looks correct to me - what semantic token type are you expecting here?

It's not a variable. It cannot be assigned to, passed by reference, etc. It should be distinguishable from variables and classified as a preprocessor entity. Taking inspiration from the C++ extension I'll recommend entity.name.other.preprocessor.predefined.

Issue 10 - Label names incorrectly classified as variables.

This is fairly unlikely to change - it might be possible to infer but it isn't a common enough case that would justify it.

I'm not sure what makes this different from any other case we're considering. The textmate grammar gets this right so it would be more beneficial for the purposes of theming to not classify this at all.

Issue 11 - global incorrect classified as entity.name.label.cs.

This is now keyword

Confirmed, but worse than textmate scope in some cases and better in others.

Issue 12 - Semantic token type namespace is not mapped to a textmate scope

I think this is working, though note that we don't define this mapping (we return the namespace token type and vscode defines the fallback).

It's mapped to entity.name.namespace. I might have missed it before. It's unfortunately impossible to tell which fallback scope is assigned unless your theme happens to have a selector for it.

There are a few options within reach of the extension here:

Currently the only way to reasonably theme C# code in vscode using this extension is to disable semantic highlighting for the entire theme or ask users to disable semantic highlighting for the C# language in their own settings.

JoeRobich commented 3 days ago

The textmate grammar has seen a lot of improvement since this issue was opened. If you are still running into problems, please open an issue at https://github.com/dotnet/csharp-tmLanguage/issues