PowerShell / vscode-powershell

Provides PowerShell language and debugging support for Visual Studio Code
https://marketplace.visualstudio.com/items/ms-vscode.PowerShell
MIT License
1.7k stars 490 forks source link

Add support for all token types with the new semantic highlighting #2852

Open MartinGC94 opened 4 years ago

MartinGC94 commented 4 years ago

Summary of the new feature

Powershell has 20 tokentypes and it would be nice if the semantic highlighting supported all of them. See: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.pstokentype

The current semantic highlighting implementation is currently missing the following types:

If/When these gets added I would suggest changing the following mappings:

The following screenshot hopefully shows what I mean: Mapping issue

TylerLeonhardt commented 4 years ago

PSTokenType is the old type... It's a Windows PowerShell v2 relic...

Token is the new type: https://docs.microsoft.com/dotnet/api/system.management.automation.language.token

Which has a TokenKind: https://docs.microsoft.com/dotnet/api/system.management.automation.language.tokenkind

MartinGC94 commented 4 years ago

Interesting, I didn't know that. I still think it's worth taking inspiration from PSTokenType when mapping TokenKind to the VS code tokens because 20 tokens is a lot more manageable than 100+ and the old TokenType is basically logical groupings of the newer more granular TokenKind. Personally I'm mostly interested in getting separate tokens for Attributes and unquoted parameters so I can make them stick out a bit more from everything else.

TylerLeonhardt commented 4 years ago

The best way to see if we can do anything about this is to run it through the PowerShell parser/tokenizer:

$tokens = $null
$errs = $null
[System.management.automation.language.Parser]::ParseInput($myScriptAsAStr, [ref] $tokens, [ref] $errs)

Then look at $tokens. If there's something distinguishing about Attributes or un-quoted parameters, we can do something!... But if there isn't, then we can't, I'm afraid.

(I didn't test that code, but it should be something along those lines... The docs: https://docs.microsoft.com/en-us/dotnet/api/system.management.automation.language.parser.parseinput?view=powershellsdk-7.0.0#System_Management_Automation_Language_Parser_ParseInput_System_String_System_Management_Automation_Language_Token____System_Management_Automation_Language_ParseError____)

MartinGC94 commented 4 years ago

Then you can do something :) Attributes have the "AttributeName" token flag which as far as I can tell is completely unique to attributes. Unquoted parameters have the kind+TokenFlag combination of "Generic" + "None" which would affect the same things I suggested "CommandArgument" would affect in the original post above.

For convenience here's a table showing the relevant types and their Kind+TokenFlag combinations:

Token Kind TokenFlags
Command Generic CommandName
Unquoted Parameter Generic None
Type Identifier TypeName
Attribute Identifier TypeName,AttributeName
TylerLeonhardt commented 4 years ago

Nice find! Interested in contributing? This is really nicely scoped to just this method: https://github.com/PowerShell/PowerShellEditorServices/blob/master/src/PowerShellEditorServices/Services/TextDocument/Handlers/PsesSemanticTokensHandler.cs#L102-L160

MartinGC94 commented 4 years ago

I don't know how I would do it. Updating the if statements and/or switch would be simple enough, but none of the standard semantic token types seem to really fit with the Powershell tokens so I would have to define a new type/sub type like it describes here: https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide#semantic-token-classification

TylerLeonhardt commented 4 years ago

Yeah I'd rather not add a new token type... Maybe use one of the token types we don't actually use.

TylerLeonhardt commented 4 years ago

Here are the possibilities: https://github.com/OmniSharp/csharp-language-server-protocol/blob/f93f973ec96d895c0c6056f5f5942c4646071ab9/src/Protocol/Models/Proposals/SemanticTokenTypes.cs#L35-L57

I like macro or label personally.

SeeminglyScience commented 4 years ago

Type is the most accurate (it is a type, but it is also an attribute). If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool. Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

rjmholt commented 4 years ago

If there's a way to do inheritance or something where you have a new token type called Attribute that falls back to Type if the theme doesn't support it that might be cool

The problem is that no popular theme will support it, and we've just increased complexity without fixing the issue. Most people will continue using their Dark+/Monokai/Solarized theme and see attributes and types the same. So it'll end up being like in https://github.com/PowerShell/vscode-powershell/issues/2856#issuecomment-669240917, where we correctly categorise the token differently and the colour is the same.

Otherwise I'm not sure it's a great idea to use a classification that isn't accurate (when an accurate one exists).

This is the real problem with themes. The themes themselves tend to be hacks designed to achieve nice coloration in the face of bad token categorisation. Which in turns forces the token classifiers to lie. I seem to recall there was a contentious change in the EditorSyntax repo where tokens were categorised more correctly and people complained because highlighting got worse.

I think the ideal here would be for us to define a correct token type, and a hack fallback, like: attribute -> label. That way the ISE theme could colour things with high specificity but users of popular themes will still get the color differentiation they're looking for.

TylerLeonhardt commented 4 years ago

@rjmholt if that's possible, I like that plan. I'll probably open an issue on vscode to start this discussion.

Also, I was skimming through the LSP docs and I noticed there's also a section for modifiers:

https://microsoft.github.io/language-server-protocol/specifications/specification-3-16/#textDocument_semanticTokens

export enum SemanticTokenModifiers {
    declaration = 'declaration',
    definition = 'definition',
    readonly = 'readonly',
    static = 'static',
    deprecated = 'deprecated',
    abstract = 'abstract',
    async = 'async',
    modification = 'modification',
    documentation = 'documentation',
    defaultLibrary = 'defaultLibrary'
}

I believe these can be added on to the TokenType to provide additional context. At first glance nothing jumps at me.