dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.95k stars 4.02k forks source link

Allow the language version for generated files to 'float' #61094

Open chsienki opened 2 years ago

chsienki commented 2 years ago

Rationale

Today we support the langVersion flag which allows a newer compiler to disallow the usage of language features that were introduced after the selected version. This was primarily so that teams that are split across multiple compiler versions do not accidentally introduce build failures by checking in code that isn't supported by earlier versions of the compiler used by other team members.

Today, a correctly implemented source generator is required to look at the language version flag itself, and change it's generation strategy to emit only supported code. In practice however, this is extremely difficult to do correctly, and requires a very deep understanding of the language to even know what is or isn't supported for a given language version.

Solution

In order to make generation strategies simpler, I propose that we allow the language version of generated files to 'float' to a higher version than may be specified by the compilation options.

It should be noted that generators already have an explicit API level that they support, based on the version of Microsoft.CodeAnalysis that they are compiled against. This means a given generator is already limited to a Roslyn compiler of that version or higher.

Instead of allowing the generator to specify the language version, or floating it to the highest supported language version in the compiler, we should float to the highest version that was supported in the compiler when Microsoft.CodeAnalysis shipped. In this way the generator implicitly declares both the API and language version it generates for as part of its contract.

This strategy ensures that if a generator is able to be loaded by a compiler, it can be guaranteed that the language version it will float to will also be supported by that compiler, and no breaks can be introduce across teams (other than the generator not loading which is already the case today).

Implementation

We would need to maintain a mapping table inside Microsoft.CodeAnalysis{.CSharp,.VisualBasic} that gives the maximum supported language version for a given API version. The generator driver would use this to lookup the language version to use when parsing trees for a given generator.

Note that different generators could have different chosen language version when compiled against different APIs.

Open questions

Today in the compilers, we enforce that all syntax trees must have the same language version within a compilation (https://sourceroslyn.io/#Microsoft.CodeAnalysis.CSharp/Compilation/CSharpCompilation.cs,512 , https://sourceroslyn.io/#Microsoft.CodeAnalysis.VisualBasic/Compilation/VisualBasicCompilation.vb,480) we would need to relax this restriction for generated files.

We would also need to consider how we report language version violations within the generated code: if a user has selected langVer 9, but we float the generator version to 10, what error should be emitted if it tries to generate something from langVer 11. We could:

Are there any other ramifications to consider if we allow trees with mixed language versions?

Example

Consider the following hypothetical version table

Microsoft.CodeAnalysis version Highest supported language version
4.0 10
4.1 10
4.2 11
4.3 11
5.0 12

A generator that is compiled against 4.0 is free to use any language features from language version 10 or below. A generator compiled against 4.3 can use features from language version 11 or below. This is true even when the generators are loaded into a 5.0 compiler, which supports a higher language version.

Youssef1313 commented 2 years ago

We would also need to consider how we report language version violations within the generated code: if a user has selected langVer 9, but we float the generator version to 10, what error should be emitted if it tries to generate something from langVer 11.

I'm confused. If the compiler already supports C# 11, it would float to 11 to 10.

Based on my understanding, the purpose of the proposal is a way to never have language version errors in generated code. How can this scenario happen?

Youssef1313 commented 2 years ago

A common analyzer pattern is:

            context.RegisterCompilationStartAction(context =>
            {
                if (((CSharpCompilation)context.Compilation).LanguageVersion < LanguageVersion.CSharpX)
                    return;
                context.RegisterSomeAction(....);
            });

Now what should analyzer authors do if this analyzer is needed to analyze and report diagnostics in generated code?

svick commented 2 years ago

I propose that we allow the language version of generated files to 'float' to a higher version than may be specified by the compilation options.

Would it be useful if the version was also allowed to float to a lower version? Breaking changes in the compiler guarded by language version are rare, but AFAIK, they do happen. So if the version was allowed to float down, a generator written for C# 10 won't be affected by a breaking change in C# 11.

Today in the compilers, we enforce that all syntax trees must have the same language version within a compilation we would need to relax this restriction for generated files.

How risky is this? For example, what are the chances that using a C# 11 feature in one part of a partial type will crash the compiler when another part is set to C# 10?

jaredpar commented 2 years ago

Based on my understanding, the purpose of the proposal is a way to never have language version errors in generated code. How can this scenario happen?

This happens when customers do one of the following:

  1. Use a generator, say the RegexGenerator, on a down level target framework like net5 or net472
  2. Deliberately move the <LangVersion> below the default for the target framework they are using. For example if a customer had the following project file data
<TargetFramework>net7</TargetFramework>
<LangVersion>8</LangVersion>

As generators are increasing in use in the ecosystem these scenarios are starting to really hamper progress. The easiest example to discuss is the Razor generator. There are several new features, like enhanced #line directives, that are necessary to create a good IDE experience. Yet it's a new language feature so Razor can't use them when the project targets an older target framework. This means essentially we can't leverage language changes to create a better development experience.

There are a couple of ways we could approach fixing this. One is to relax which features we enforce with /langVersion. The other is effectively this proposal, to give generators a way to have some control over language version.

chsienki commented 2 years ago

@Youssef1313 It would float to the highest language version that is supported by the API it was compiled against. That ensures that any compiler than can load the generator will be able to compile the code produced. As an author you choose the language version you are targeting by choosing the API you compile against.

The analyzer pattern is interesting. In the case above do you mean > LanguageVersion.CSharpX? That would be the case we need to think about.

Youssef1313 commented 2 years ago

@chsienko Both > and < seems interesting (one would cause false negatives while the other would cause false positives)

Case >

if (Version > CSharp9) return;

If Version is CSharp8 but generated code is CSharp10, then generated code will be analyzed while it's not supposed to.


Case <

if (Version < CSharp9) return;

If Version is CSharp8 but generated code is CSharp10, then generated code will not be analyzed, while it's supposed to be analyzed.

CyrusNajmabadi commented 2 years ago

I'm not sure how this simplifies things. Shouldn't generators just listen to whatever the actual language version they are generating code against and generate legal code for that lang version? How does hte 'floating' make thigns clearer or easier?

CyrusNajmabadi commented 2 years ago

we would need to relax this restriction for generated files.

This would likely have deep implications for IDE. We also view the world as all files in a compilation must agree on lang version. TBH, this genuinely feels super confusing. Now any feature that needs to check language now needs to know the origination point of the semantic question that was asked. That's non trivial for lots of hte IDE apis, meaning we'd likely get a lot of this stuff wrong.

--

Aside, can someone give a clear example of the problem that exists today and how this solves it?

jaredpar commented 2 years ago

@CyrusNajmabadi

Consider a concrete example. The #line directives are gated on /langVersion. That means while they are necessary to have a successful editing experience in Razor (HotReload, ENC, etc ...) the IDE cannot depend on them being available. If a user develops razor against say net5 then that maps to /langVersion:10 where #line enhanced directives are not supported. Effectively it means experiences like Razor cannot reliably benefit from any language or tooling improvements that we make.

CyrusNajmabadi commented 2 years ago

the IDE cannot depend on them being available.

Can you explain what is meant by that?

the IDE cannot depend on them being available.

If a user develops razor against say net5 then that maps to /langVersion:10 where #line enhanced directives are not supported

Can Razor then fallback to whatever is supported in that version of .net? I'm not seeing the problem. Isn't that a necessary fallout of targeting older platforms? You get less support for new and valuable features.

--

Note: IDE already handles thsi today with all of our features. We either disable them if targeting versions that don't support it at all. Or we figure out how to generate down-level versions of our features that support whatever is at that particular lang version.

Effectively it means experiences like Razor cannot reliably benefit from any language or tooling improvements that we make.

I'm not sure how they can't reliability benefit from language/tooling improvements? My expectation is not that we add "enhanced #line" and now that makes all prior versions better. My expectatoin is that things get better if you then target that new thing. Right?

CyrusNajmabadi commented 2 years ago

I'm really struggling with this statement:

If a user develops razor against say net5 then that maps to /langVersion:10 where #line enhanced directives are not supported

Isn't this exactly the point of langVersion? It's a way of saying: here are the features of the language i want to be usable in my program. If i say, for example, that i should target a version of C# that doesn't support index/range... then i expect my source generated code to respect that. If that source generated code uses index/range... then it's not compliant with the explicit decision i made as a user as to what features of hte language are available to it.

jaredpar commented 2 years ago

the IDE cannot depend on them being available. Can you explain what is meant by that?

The Razor source generator can't control what target framework, and by extension language version, a user chooses. That means it's code generation experience is dictated by user decisions and there are several target frameworks we support that don't have features beneficial to the IDE experience.

Can Razor then fallback to whatever is supported in that version of .net? I'm not seeing the problem. Isn't that a necessary fallout of targeting older platforms? You get less support for new and valuable features.

The issue isn't about the efficiency of the generated code. It's not about enabling perf with Span, or using new features to make code gen prettier, etc ... The issue is the IDE experience, basically our ability to properly do ENC, Hot Reload, etc ... are predicated on new features in the language / compiler. Consider that #line enhancements were added specifically because Razor could not do ENC / Hot Reload properly without it. The entire feature was an acknowledgement that we needed language / compiler features to make the IDE experience complete.

So yes Razor can fall back but when it does it means the IDE experience suffers. This can run the gambit from features like ENC stop working all the way to we cannot move on from the runtime / design time code gen fracturing that is holding back the entire experience.

My expectatoin is that things get better if you then target that new thing. Right?

This reasoning though means the IDE team is okay with statements like

ENC / Hot Reload for Razor works when targeting net7 but not any other target framework

CyrusNajmabadi commented 2 years ago

This reasoning though means the IDE team is okay with statements like

Yes. That seems ok to me. New features and improvements may involve targetting newer frameworks/platforms/IDEs/etc. If anything, I honestly think that fits divisional desire to get things moving forward and not have an enormous investment in trying to take and improve these older scenarios.

Youssef1313 commented 2 years ago

Is #line the most important thing about this proposal? Could it be un-gated to a language version (like CallerArgumentExpression)?

CyrusNajmabadi commented 2 years ago

@Youssef1313 i was talking to @chsienki about this. I do feel like if this is just about #line that we can likely trivially do something just for that, without having a very complex versioning system introduced that we'd then have to thread through all the IDE code. Perhaps even a #pragma would work here (which would likely be fine since razor is generating all this code for this fiel itself).

jaredpar commented 2 years ago

This reasoning though means the IDE team is okay with statements like

Yes. That seems ok to me

I disagree. To me it creates a fractured IDE experience. It means that even though customers are working on platforms that we fully support we are effectively not improving their experience. We're saying whatever features and experience were available X years ago is what they get now. Improvement is not possible.

It also means that as a team we're committing to maintaining infrastructure and features that are buggy, difficult to maintain and simply not keeping up with the pace of VS development. It's a tax on all the teams involved because we are maintaining entire systems simply because we can't use #line reliably in our codegen. The cost is way out of proportion here.

jaredpar commented 2 years ago

The intent of /langversion is to be a guard rail for customers who are interested in mixed development scenarios. It's not meant to be a mechanism for reducing progress in VS which it has become.

CyrusNajmabadi commented 2 years ago

We're saying whatever features and experience were available X years ago is what they get now. Improvement is not possible.

Improvement is possible, if you take on new versions of things. If you want to lock to an older version, then you are locked to that. That seems fine to me. We do this, for example, with our analyzers and whatnot. If you want new anlayzers or improvements to our analyzers, you install new versions.

If you want better codegen in our IDE features, you need to target a newer lang version. If you target older versions i am totally ok with teh idea that you get less capabilities/features. That's the acceptable price for locking to thingsl ike that.

jaredpar commented 2 years ago

If you want better codegen in our IDE features, you need to target a newer lang version. If you target older versions i am totally ok with teh idea that you get less capabilities/features. That's the acceptable price for locking to thingsl ike that.

I fundamentally disagree here. We are creating an artificial tax on VS progress here. That tax is expensive and it's holding the customer experience back for reasons that are extremely hard to justify.

CyrusNajmabadi commented 2 years ago

The intent of /langversion is to be a guard rail for customers who are interested in mixed development scenarios

The intent of /langversion for me is ot say what features of hte language are allowed or not. If i say that i only want features up to X and i get them up to Y (where Y > X) then my desires are overridden here.

Note that the impl details of an IDE feature shoudln't matter her.e The end thing that is generated should respect the user settings. How the IDE feature works under the covers shouldn't matter (we could have totally different systems for IDE versus command line).

CyrusNajmabadi commented 2 years ago

for reasons that are extremely hard to justify.

The reason seems really easy to justify. You asked for a particular language version. These features depend on a higher language version, ergo you don't get those enhanced features.

If we do want this to work, we can also just trivially do things like just flat out ignore lang version in this file for #line. Or give razor a hidden way to suppress it. It doesn't need to be part of complicated versioning system.

CyrusNajmabadi commented 2 years ago

I think we should have a meeting on this.

jaredpar commented 2 years ago

The C# IDE team regularly relies on new compiler features in order to get work done. This runs the gambit from APIs, to project settings to command line switches. These are only available using the new compiler. Yet the IDE team has never expressed that they feel these IDE improvements should be predicated on target framework.

I don't see the Razor scenario as any different here. These are compiler features that the IDE team, not the compiler team, have determined are critical to the IDE experience. Artificially limiting them is just reducing the user experience and creating enormous cost on the razor compiler and ide teams.

Neme12 commented 2 years ago

@Youssef1313 It would float to the highest language version that is supported by the API it was compiled against. That ensures that any compiler than can load the generator will be able to compile the code produced. As an author you choose the language version you are targeting by choosing the API you compile against.

Hm, if a compiler can only load a generator if the generator targets that version of the compiler or a lower one, that means that noone else on the team will use a compiler version lower than one that can load that generator, otherwise they couldn't compile the code, right? It seems that if a generator targets a compiler API that supports, let's say, C# 10, the generator could just say "for the generator to work correctly, please upgrade your language version to C# 10." That should be fine because if someone had a lower compiler version that doesn't support C# 10, they couldn't compile the code anyway because they aren't able to load the generator. So this wouldn't break the team scenario and there's no reason to keep the language version lower if they're required to have a certain minimum compiler version anyway to load the generator. No need for stuff like different language versions across the project.

Neme12 commented 2 years ago

Oh, in my previous comment, I forgot that while they can upgrade the language version, it's not actually a supported scenario to upgrade it on a lower version of .NET :(

The intent of /langversion is to be a guard rail for customers who are interested in mixed development scenarios. It's not meant to be a mechanism for reducing progress in VS which it has become.

It seems to that the new concept of tying the C# version to a .NET version and saying that the new C# version is only supported on this version of .NET basically already is a mechanism for reducing progress (even though it still works to use it with a lower version of .NET, but isn't officially "supported" and nobody could complain if it stopped working). It's a bit ironic that this proposal would actually go against that and make using a newer C# language version with eg. .NET 5 a supported scenario because of the version floating.

I feel like if VS/.NET didn't restrict the language version based on the target framework by default, this wouldn't be an issue as the users would already have the new line directives available if they're using the newer toolset that can take advantage of it.

Neme12 commented 2 years ago

If you want better codegen in our IDE features, you need to target a newer lang version. If you target older versions i am totally ok with teh idea that you get less capabilities/features. That's the acceptable price for locking to thingsl ike that.

But upgrading to a newer language version is easy and unlikely to break users. That's not the case with updating .NET though. Every time I did that on a project, it broke so many things, I had to rewrite parts of it and then still later encountered features that had broken and I didn't notice at the time. So yes, I think saying "you have to use a newer lang version to have the improvements" is reasonable, but it's a lot more problematic with .NET versions. By this logic, the Roslyn codebase would be stuck with an old C# version because it has to target older platforms (.NET Standard). Sometimes, people have to support older platforms and it's not just a case of "well, if you want the new features, just upgrade, it's easy".

KathleenDollard commented 2 years ago

Summarizing the thread (partly for me):

When I first read this I thought "its nuts to float the LangVer". However, on further thought, I currently think we can and should, unless we have a scenario for freezing it that I do not yet see. The known case is served by locking the lowest LangVer to the one the generator was created with. However, I have questions:

Do we have a lock on the LangVer was created with or are we assuming that the author tested with the same LangVer?

We still have a problem I would like us to consider. How would the person adding the generator package that they will break their friends if they work with tools that cannot handle the generators LangVer? I might propose a warning for generators when they are used with downlevel LangVer, which would help avoid generators that are not tested at all downlevel. However, us providing generators that come along with warnings may not be a good choice. Can we allow generators to specify the lowest LangVer they can work with?

jaredpar commented 2 years ago

Supporting downlevel language versions in generators can cause unique problems in the IDE experience. The current known case is #line enhancements

A couple of us chatted about this last week and I forgot to post an update. The conclusion was that #line is a compiler feature, not a language feature. Compiler features are not gated behind LangVersion and thus neither should #line.

jcouv commented 2 years ago

The LangVer requirement for line-span directives was now relaxed (PR https://github.com/dotnet/roslyn/pull/61686)