dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

[API Proposal]: System.Diagnostics.CodeAnalysis.StringSyntaxAttribute #62505

Closed stephentoub closed 2 years ago

stephentoub commented 2 years ago

Background and motivation

VS provides nice colorization of regular expressions when passed to known methods on Regex, e.g.

image

It also supports putting a comment before any string to mark the language being used in that string:

image

But there's currently no way to define an arbitrary API and annotate a string parameter as to what language is expected, e.g. you don't get this colorization today for RegexGenerator, because VS hasn't been taught about this API specifically:

image

While these examples are about regex, more generally this applies to arbitrary domain-specific languages. If VS adds colorization for JSON, for example, it'd be nice if every API that accepts JSON could be annotated appropriately.

API Proposal

namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter | AttributeTargets.Field | AttributeTargets.Property, AllowMultiple = false, Inherited = false)]
    public sealed class StringLanguageAttribute : Attribute
    {
        public StringLanguageAttribute(string language);
        public string Language { get; }
    }
}

API Usage

public sealed class RegexGeneratorAttribute : Attribute
{
    public RegexGeneratorAttribute([StringLanguage("regex")] string pattern);
}

Alternative Designs

No response

Risks

No response

madelson commented 2 years ago

Will/can we add a constant for sql? I realize that there are multiple SQL dialects out there, but there is enough commonality to make IDE syntax highlighting helpful. This would be great on DbCommand.CommandText as well as for libraries like Dapper.

dotMorten commented 2 years ago

@stephentoub Thanks for the demo about this today. You said it is available in VS17.2, and I'm trying with preview 1, and not seeing it working. Is it supposed to work and I'm doing something wrong, or not in the first preview? (I included the attribute code as you mentioned would still work so I get it in versions prior to .NET7) image

dotMorten commented 2 years ago

D'oh! I just made the argument for why the "Json" text should be a constant, because I spelled it all lower-case. Works once properly cased

CyrusNajmabadi commented 2 years ago

Will/can we add a constant for sql? I realize that there are multiple SQL dialects out there, but there is enough commonality to make IDE syntax highlighting helpful. This would be great on DbCommand.CommandText as well as for libraries like Dapper.

We have no plan on this currently. Primarily because there's no real way to estimate the cost of this work currently. For one thing, we have no tech available to us to even suitable lex/parse even one dialect of sql, let alone the myriad dialects actually used in practice. We'd need a strong proposal on how to actually achieve this.

Json has hte benefit of being a staggeringly simple language to add support for. And for regex we have hte canonical impl in teh runtime that we were able to ape in order to add this support.

madelson commented 2 years ago

we have no tech available to us to even suitable lex/parse even one dialect of sql, let alone the myriad dialects actually used in practice

@CyrusNajmabadi is it a requirement that there be full lexing/parsing capabilities just to annotate this? Even very trivial support in an IDE (think highlighting common keywords like SELECT, WHERE, and JOIN plus operators) would make blocks of SQL text a lot easier on the eyes.

terrajobst commented 2 years ago

@stephentoub @bartonjs did we add any other constants for well-known languages? The approved shape only had one for Regex.

Here is my understanding of the final shape:

namespace System.Diagnostics.CodeAnalysis
{
    [AttributeUsage(AttributeTargets.Parameter |
                    AttributeTargets.Field |
                    AttributeTargets.Property,
                    AllowMultiple = false,
                    Inherited = false)]
    public sealed class StringSyntaxAttribute : Attribute
    {
        public StringSyntaxAttribute(string syntax);
        public StringSyntaxAttribute(string syntax, params object[] arguments);

        public string Syntax { get; }
        public object[] Arguments { get; }

        public const string Regex = "regex";

        // As we're adding more support we can add new languages like:
        // public const string Xml = "xml";
        // public const string Json = "json";
    }
}
dotMorten commented 2 years ago

The string is Json (Upper case J). Lower case like above doesn’t work (at least in preview 1) and threw me off for a while.

https://github.com/dotnet/runtime/blob/785f81814f873112b2469694524d26c0592aed96/src/libraries/System.Private.CoreLib/src/System/Diagnostics/CodeAnalysis/StringSyntaxAttribute.cs#L40-L47

terrajobst commented 2 years ago

@madelson I'm not opposed to adding annotations for languages that Roslyn has no plans to support; this would allow other tools such as Rider to add support. However, I think we'll want at least one party that does something useful with it to ensure the annotations are sensible. For example, if you only ever want syntax highlighting, then sure, "SQL" might be good enough with a large dictionary of the various keywords used in SQL. However, if an IDE wants to do something more, such as code completion, then the string "SQL" might no longer be good enough. In the context of the BCL, "regex" basically means System.Text.RegularExpression, so that's in practice sufficiently unique. However, in .NET today there are various SQL dialects in use (MS SQL Server, Oracle, Postgress, etc). I can be convinced that this isn't a problem, but I'd like to hear this from an implementing party, not from a consumer because we'd likely make faulty assumptions.

stephentoub commented 2 years ago

did we add any other constants for well-known languages? The approved shape only had one for Regex.

We added Json, which is in the shown final shape as being fine to add when there's support, and we added DateTimeFormat, as you proposed and was discussed here https://github.com/dotnet/runtime/issues/62505#issuecomment-1009346603.

jnyrup commented 2 years ago

@stephentoub in the Community Standup you mentioned keeping StringSyntaxAttribute internal if we implement it for codebases not targeting .NET 7.

For a multi-targeting public API (think nuget package) the best workaround I can think of without risking assembly conflicts is

public class Class2
{
    public void MyMethod([StringSyntax("Regex")] string pattern) { }
}

#if !NET7_0_OR_GREATER
internal sealed class StringSyntaxAttribute : Attribute { ... }
#endif

Is there a way to expose StringSyntaxAttribute to older TFMs without assembly conflicts or is this a technical limitation?

stephentoub commented 2 years ago

For a multi-targeting public API (think nuget package) the best workaround I can think of without risking assembly conflicts is

What's wrong with that approach? I don't think that's so much a workaround as it is the recommended course of action if you need to use the attribute downlevel.

dotMorten commented 2 years ago

Making it internal works beautifully for me in my multi targeted project

jnyrup commented 2 years ago

Sorry, thought I had tested this and thought the attribute had to be public to take effect. It works brilliantly - this is awesome!

lsoft commented 2 years ago

sorry for dumb question: I ask about SQL. If I know which dialect we are using in our project, is there any way to colorize SQL keywords? I mean parse SQL not by VS itself, but I already have a piece of code that can grab SQL from the code (via Roslyn) + can parse it to the tokens, so I have an information about tokens and theoretically can provide this info to VS visualization layer somehow. is it feasible now?

madelson commented 2 years ago

@terrajobst I hear what you're saying about there being different dialects, but in many if not most cases SQL strings are being passed to a more generic SQL library (DbCommand.CommandText, EF, Dapper, etc) so even if the IDE had the ability to handle specific flavors of SQL in most cases it would probably have to fall back to lowest common denominator anyway.

Waiting to get buy-in from an implementer makes sense; just offering that as a consumer this would be valuable; definitely more so for my use-cases than JSON or date formats. I also suppose that there doesn't need to be an official constant here so long as the library authors and IDEs agree on a value to use for this.

roji commented 2 years ago

For SQL, see this comment: https://github.com/dotnet/runtime/issues/65634#issuecomment-1058524593

[...] even if the IDE had the ability to handle specific flavors of SQL in most cases it would probably have to fall back to lowest common denominator anyway.

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted. The moment you e.g. want to get only X rows, SQL Server has TOP, PG/SQLite have LIMIT/OFFSET, etc. IMHO for anything to be useful around SQL, there needs to be some setting somewhere that says what the dialect is, which the analyzer or IDE would pick up.

CyrusNajmabadi commented 2 years ago

We are making this extensible at the Roslyn layer. But it will be up to a different party entirely to provide any level of SQL support.

madelson commented 2 years ago

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted.

I imagine that if VS/VSCode/Rider were to implement this then it would start with just the base keywords (SELECT, JOIN, WHERE, etc) and maybe some operators. I've worked with SQL in editors with this behavior and a little syntax highlighting goes a long way towards making the code easier to read.

IDEs can of course go beyond this by offering configurable dialect highlighting like JetBrains does.

deeprobin commented 2 years ago

I'm not sure what "lowest common denominator" means in SQL. Sure, there's a certain subset of statements which work everywhere, but it's really quite restricted.

I imagine that if VS/VSCode/Rider were to implement this then it would start with just the base keywords (SELECT, JOIN, WHERE, etc) and maybe some operators. I've worked with SQL in editors with this behavior and a little syntax highlighting goes a long way towards making the code easier to read.

IDEs can of course go beyond this by offering configurable dialect highlighting like JetBrains does.

We could indeed support only ANSI/ISO SQL (ISO/IEC 9075-1) by means of the identifier sql and then build on it in the long run with sql+mysql, sql+mssql/sql+tsql (maybe with version identifier? - sql+mysql8.0)... That would be my approach now, anyway. Feel free to leave feedback.

madelson commented 2 years ago

build on it in the long run with sql+mysql, sql+mssql/sql+tsql...

As mentioned previously, likely most applications of this would be SQL dialect agnostic members like DbCommand.CommandText, parameters to various methods in Dapper, EF, etc. The value is in the enhanced readability of SQL blocks in an IDE as opposed to perfect parsing. Many text editors/IDEs/wikis/etc offer this sort of agnostic SQL syntax highlighting indicating that it is indeed useful despite the differences in SQL dialects.