dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.42k stars 4.76k forks source link

Update .Net source generators to use new SyntaxValueProvider.ForAttributeWithMetadataName api. #70754

Closed CyrusNajmabadi closed 2 years ago

CyrusNajmabadi commented 2 years ago

Roslyn recently added a new API for greatly speeding up SGs that use the pattern of looking for nodes that have an attribute on it with a particular fully qualified name for the attribute. For example, the json serialization generator looks for this attribute: https://github.com/dotnet/runtime/blob/978df67ced885aeca5f7e75379451bc1a57cc219/src/libraries/System.Text.Json/gen/JsonSourceGenerator.Parser.cs#L46

Currently, this is extremely expensive as on every edit, the generator must hit every attributed class in every file and semantically check if it has an attribute that binds to this fully qualified name. This may mean thousands of files bound, with expensive semantic models and binding checks performed, all for nodes+attributes completely unrelated to this SG.

The added API provides a new entry point to get the same functionality as previously possible, but with much better performance. Now, the generator says what attributes they care about (passing in teh fully qualified metadata name) and roslyn can smartly examine the source code and not even bother doing any binding or semantics when it would be pointless. Intuitively, you can think of it that when roslyn sees [FooBar] it goes "well, FooBar (and FooBarAttribute) could not ever bind to JsonSerializableAttribute, so it's pointless to even try". This massively reduces the amount of work done (>40x in roslyn itself) especially when the attribute is never, or almost never, used.

Once this api is available in a consumable roslyn package, we should update all the runtime's SGs to use it. That includes the generators for

  1. Regex. Done in https://github.com/dotnet/runtime/pull/70911. This also added the main polyfill code.
  2. Json. Done in https://github.com/dotnet/runtime/pull/71653
  3. Logging. Done in https://github.com/dotnet/runtime/pull/71651
  4. eventsource. Done in https://github.com/dotnet/runtime/pull/71662
  5. libraryimport. in progress: https://github.com/dotnet/runtime/pull/71652
dotnet-issue-labeler[bot] commented 2 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

CyrusNajmabadi commented 2 years ago

Tagging @stephentoub @ViktorHofer for visibility. I'm happy to contribute the changes on the runtime side for this if that can help out.

jkoritzinsky commented 2 years ago

We need to make sure that we not only keep compatibility for downlevel generators (so introduce another build variant of the out-of-band packaged generators that uses the new roslyn API version) and ensure that we don't break uplevel repos like aspnetcore by adopting a Roslyn version that is newer than what many of them use (as we turn on in-box generators by default, which causes build failures when the Roslyn version the upstream repo uses is too old).

Generally, we try to keep the version of Roslyn that the "new" generators build against to at newest the version that ships with the SDK that dotnet/arcade is using to ensure we won't cause compile errors upstack. https://github.com/dotnet/runtime/blob/042fdf43dedd58dc3e6cdc118df5a7198792a9e7/eng/Versions.props#L34-L39

For generators that don't ship outside of dotnet/runtime, we can update them as soon as we have a NuGet package (so that includes the EventSource generator and the XUnitWrapperGenerator)

CyrusNajmabadi commented 2 years ago

Note: teh major offenders i'm seeing in traces are Json and Logging. So it woudl be great to be able to update these asap and get these into a VS release that has the new updated API.

jkoritzinsky commented 2 years ago

Since logging ships out-of-band, we could add another build dimension to it now against the newer Roslyn API version since we can put it into a different path in the NuGet package based on Roslyn version.

JSON ships in-box too, so we can't update the in-box version without breaking upstream repos, but we could update the package version for downlevel consumers immediately if we wanted to.

Basically we can update every generator project that isn't in the following list in some manner now (either by introducing new projects for the out-of-band generators or updating the in-repo ones directly), and then we can update the ones in the list after we know we won't break upstream repos.

https://github.com/dotnet/runtime/blob/6896e1bb4aae0877a6cb154710b6936e4d69716e/src/libraries/NetCoreAppLibrary.props#L187-L191

danmoseley commented 2 years ago

cc @joperezr who I believe is looking at source generator perf at the moment.

ghost commented 2 years ago

Tagging subscribers to this area: @dotnet/area-meta See info in area-owners.md if you want to be subscribed.

Issue Details
Roslyn recently added a new API for greatly speeding up SGs that use the pattern of looking for nodes that have an attribute on it with a particular fully qualified name for the attribute. For example, the json serialization generator looks for this attribute: https://github.com/dotnet/runtime/blob/978df67ced885aeca5f7e75379451bc1a57cc219/src/libraries/System.Text.Json/gen/JsonSourceGenerator.Parser.cs#L46 Currently, this is extremely expensive as on every edit, the generator must hit every attributed class in every file and semantically check if it has an attribute that binds to this fully qualified name. This may mean thousands of files bound, with expensive semantic models and binding checks performed, all for nodes+attributes completely unrelated to this SG. The [added API](https://github.com/dotnet/roslyn/issues/54725) provides a new entry point to get the same functionality as previously possible, but with much better performance. Now, the generator says what attributes they care about (passing in teh fully qualified metadata name) and roslyn can smartly examine the source code and not even bother doing any binding or semantics when it would be pointless. Intuitively, you can think of it that when roslyn sees `[FooBar]` it goes "well, FooBar (and FooBarAttribute) could not ever bind to JsonSerializableAttribute, so it's pointless to even try". This massively reduces the amount of work done (>40x in roslyn itself) especially when the attribute is never, or almost never, used. Once this api is available in a consumable roslyn package, we should update all the runtime's SGs to use it. That includes the generators for 1. Json 2. Regex 3. Logging 4. libraryimport 5. eventsource 6. Client.SourceGenerator (asp.net: https://github.com/dotnet/aspnetcore/tree/3ea008c80d5cc63de7f90ddfd6823b7b006251ff/src/SignalR/clients/csharp/Client.SourceGenerator/src) @jaredpar has prototyped this for a couple here: https://github.com/dotnet/runtime/commit/909709afc558ae370078514d05f553e5dfe8e4e5 https://github.com/dotnet/runtime/commit/c22809d61bad7616204e43e49923af63926c4c9a The changes are straightforward and easy to adopt.
Author: CyrusNajmabadi
Assignees: -
Labels: `area-Meta`, `tenet-performance`, `untriaged`, `source-generator`
Milestone: -
CyrusNajmabadi commented 2 years ago

@stephentoub @jkoritzinsky

Basically we can update every generator project that isn't in the following list in some manner now (either by introducing new projects for the out-of-band generators or updating the in-repo ones directly), and then we can update the ones in the list after we know we won't break upstream repos.

There's a potential hybrid option we can take.

  1. use the new roslyn API in all the generators where it is safe to do so immediately.
  2. port most** of the implementation of hte new API over to the runtime repo for the generators that cannot move over immediately. This would allow us to get the perf benefit without having to wait a very long.

So what does most** mean? Well, currently the API depends on some internal functionality in order to support a very unlikely, but useful full correctness case. Specifically, the API will understand if you do the following:

File1:

global alias FooBarAttribute = System.Text.Json.Serialization.JsonSerializableAttribute;

File2:

[FooBar]
class Whatever { }

In other words, it knows that that global alias exists, and as such the json-generator should run on Whatever. If you were willing to say that that was a currently unsupported scenario, we could port over the entirety of the new API except teh part that supports global aliases. Practically speaking, i think all users would still be fine. It would be up to you if this functionality break would be worth the massive speedup we're seeing (over 40x in roslyn itself).

To me, given just how bad the perf problem is here, my preference is that we take the perf gain for all the generators, and we accept a highly unlikely functionality break. Then, when you can finally move all generators over to the new api we'll have teh best of both worlds. High performance, and complete correctness.

Any thoughts from your perspective?

ViktorHofer commented 2 years ago

Here are my thoughts on this:

Based on that I would propose the following:

cc @eerhardt @jaredpar

eerhardt commented 2 years ago

It might make sense to change the roslyn4.0 folder in our packages in which the source generators reside to the actual roslyn version that we will depend on when shipping .NET 7. If I'm not mistaken, that would be roslyn4.4 with this change. Presumably the SDK falls back to use the roslyn3.11 targeting source generators if a Visual Studio version is used that doesn't support roslyn4.4.

The issue with this proposal is that projects building with the 6.0.x00 SDK will now see degraded performance when they use the 7.0.0 System.Text.Json nupkg. With the 6.0.x System.Text.Json nupkg, they will use the roslyn4.0 version, which has incremental build support. When the project upgrades to the 7.0.0 nupkg, there is no roslyn4.0 built version, so they will fall back to roslyn3.11 and now all the incremental support we took advantage of in .NET 6 is removed.

It is pretty easy to get into this situation AFAICT. You just need to be on a VS 2022 version that doesn't include .NET 7 - so let's say VS 17.0. You are using the 6.0.0 System.Text.Json nupkg, and NuGet Manager says a new version is out in the "Updates" tab, so you update to STJ 7.0.0.

ViktorHofer commented 2 years ago

Agreed, that's a problem and needs to be solved, thanks for pointing that out. What makes this problematic is that we currently don't have a lifecycle policy for source generators when they ship inside packages. I just recently had this discussion with @eerhardt.

Visual Studio 2019, which only works with the roslyn3.11 source generators is supported until April 2029. Based on that, we would need to build and package our roslyn3.11 source generators for the next 6-7 years.

Visual Studio 2022 17.0 is supported until July 2023, 17.2 until January 2024 and the support date for 17.3 isn't yet announced. Because of these support dates, we would need to keep our roslyn4.0 targeted source generators around until at least .NET 9.

It's very likely that we will again depend on newer roslyn features going forward and with that build and package even more versions of our source generators. Packages will grow unbounded and the main branch hosts all different versions of the roslyn source generators.

I don't think there's a good solution to mitigate the increasing build complexity, package growth and ever expanding matrix of source generators. One option that we might want to discuss is to couple our package support policy with toolset versions. Just theoretically speaking, we could mark the System.Text.Json 7.0 package to not support Visual Studio 2019 or older SDKs that don't have a specific roslyn version available. I don't suggest that we should to that but wanted mention it.

CyrusNajmabadi commented 2 years ago

We should upgrade non-shipping roslyn4.x source generators to leverage the new API when publicly released tooling supports them. My assumption is that this will be VS 17.4 but I could be mistaken.

This would be a pretty bitter pill @ViktorHofer . That's likely 6 months out. Can you clarify why we could not get this in for the 17.3 line?

CyrusNajmabadi commented 2 years ago

@ViktorHofer @eerhardt any thoughts on: https://github.com/dotnet/runtime/issues/70754#issuecomment-1155946743

At least with the idea there, we could get performance benefits to users immediately, albeit with a potential (though hopefully unlikely) functionality regression. This would be a stop-gap for the immediate future until you could actually adopt this new API (which has all hte performance benefit, without any functionality regression).

I'm coming from this from a perspective of trying to address the extremely bad performance regressions we're seeing in VS, which we are already getting complaints on from users. Having to wait until 17.4 is something i'd really like to avoid.

ViktorHofer commented 2 years ago

This would be a pretty bitter pill @ViktorHofer . That's likely 6 months out. Can you clarify why we could not get this in for the 17.3 line?

Your change was merged into main and https://github.com/dotnet/roslyn/commits/release/dev17.4 but not https://github.com/dotnet/roslyn/commits/release/dev17.3. Based on that, I don't think that your change will be part of the roslyn that ships with 17.3. @jaredpar is my assumption correct? Would it be possible and make sense to backport @CyrusNajmabadi's change into dev17.3?

CyrusNajmabadi commented 2 years ago

pulling in @vatsalyaagrawal . I'm def a little confused. I was under the impression that our main branch was 17.3, not 17.4. @vatsalyaagrawal can you clarify?

vatsalyaagrawal commented 2 years ago

Yes, Roslyn main branch is currently pointing to 17.3.P3.

ViktorHofer commented 2 years ago

That's great to hear. That would mean that we can update our internal non-shipping source generators to make use of the new API when VS 17.3 P3 publicly releases.

CyrusNajmabadi commented 2 years ago

@ViktorHofer Ok great. I think that's a much more reasonable time frame :)

stephentoub commented 2 years ago

That would mean that we can update our internal non-shipping source generators to make use of the new API when VS 17.3 P3 publicly releases.

Ok great. I think that's a much more reasonable time frame :)

To make sure there's no disconnect, @CyrusNajmabadi, I believe Viktor isn't talking about the two you actually care about today (json and logging). "internal non-shipping source generators" sounds to me like our generators that only support the building of runtime. @ViktorHofer, can you clarify?

ViktorHofer commented 2 years ago

Correct. Let me update my above proposal:

  1. We should upgrade non-shipping roslyn4.x source generators to leverage the new API when VS 17.3 Preview 3 publicly releases.
  2. We should update our inbox shipping roslyn4.x source generators to leverage the new API when dependent repositories upgraded to the newer code analysis version. If I'm not mistaken, the roslyn version that this depends on, will ship with .NET 7 Preview 6. Based on that, such changes would need to target .NET 7 Preview 7.
  3. We should add roslyn4.3 targeting source generators to our out-of-band shipping packages to leverage the new API. These would co-exist with the already shipping roslyn3.11 and roslyn4.0 source generators in the package. This can be done in combination with the non-shipping internal source generators right when VS 17.3 Preview 3 publicly releases.
  4. It might make sense to change the roslyn4.0 folder in our packages in which the source generators reside to the actual roslyn version that we will depend on when shipping .NET 7. If I'm not mistaken, that would be roslyn4.3 with this change. Presumably the SDK falls back to use the roslyn3.11 targeting source generators if a Visual Studio version is used that doesn't support roslyn4.3. @eerhardt pointed out above that this would regress the performance for customers that use VS 2022 < 17.3. .NET 8 is probably a better target to remove the roslyn4.0 source generators from our packages.
CyrusNajmabadi commented 2 years ago

@ViktorHofer Can you clarify: in what VS release woudl customers get the important performance fixes for all .net generators? Right now, the perf pain is extremely significant, so we really need a plan/solution that can address this in the near term.

Thanks!

stephentoub commented 2 years ago

I didn't see an answer to @CyrusNajmabadi proposal. As I understand it, he's proposing adding code for a polyfill into runtime. Generators built against the latest sdk get all the benefits; generators built against older sdks get most of the benefits. If that's an accurate summary, that sounds good to me both from a perspective of helping all customers soon and from a perspective of keeping the diffs between our builds minimal. Is this feasible?

CyrusNajmabadi commented 2 years ago

@stephentoub That is correct with the caveat that a currently supported case may temporarily not be supported. Specifically, the case where a user defines a global alias to the attribute the generator needs, and then references that aliases in the attributes they have on their types.

ViktorHofer commented 2 years ago

@ViktorHofer Can you clarify: in what VS release woudl customers get the important performance fixes for all .net generators? Right now, the perf pain is extremely significant, so we really need a plan/solution that can address this in the near term.

If we follow my above proposal, that would be .NET 7 Preview 7.

I didn't see an answer to @CyrusNajmabadi proposal. As I understand it, he's proposing adding code for a polyfill into runtime. Generators built against the latest sdk get all the benefits; generators built against older sdks get most of the benefits. If that's an accurate summary, that sounds good to me both from a perspective of helping all customers soon and from a perspective of keeping the diffs between our builds minimal. Is this feasible?

If I understand @CyrusNajmabadi's proposal correctly, this would just be temporary until all source generators can target the new API and we would then remove the polyfill. Is that right? If so, would we able to do so in time for Preview 6 which I believe closes next week?

stephentoub commented 2 years ago

If we follow my above proposal, that would be .NET 7 Preview 7.

For all customers? Above you wrote "These would co-exist with the already shipping roslyn3.11 and roslyn4.0 source generators in the package"...who would still be using those?

CyrusNajmabadi commented 2 years ago

If we follow my above proposal, that would be .NET 7 Preview 7.

Sorry, i'm asking which VS release, not which .net release. My strong preference is that we have some perf fix in place for 17.3.

ViktorHofer commented 2 years ago

who would still be using those?

Anyone using older versions of Visual Studio but the very latest packages:

Sorry, i'm asking which VS release, not which .net release. My strong preference is that we have some perf fix in place for 17.3.

.NET and Visual Studio don't release in sync and afaik we haven't yet publicly announced when VS 17.3 releases. I'm following-up on dates offline with @CyrusNajmabadi.

In general, we can update anything out-of-band faster than anything inbox because we just add another variant of the source generators there without requiring customers to update their toolset.

stephentoub commented 2 years ago

Anyone using older versions of Visual Studio but the very latest packages:

What about .NET 6 apps targeting the .NET 6 SDK?

CyrusNajmabadi commented 2 years ago

As an example: https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1555650 is one of many customer reported cases of this. In this case basic IDE features are being delayed by 10s of seconds, and the servicehub process is churning through CPU and tons of ram. In this case 30+GB just from the jsongenerator and logging generator.

Note that these are features which we generally want to run in 100s of ms worst case. So we're several orders of magnitude impacted here. This also other OOP semantic operations (like diagnostics and whatnot).

jkoritzinsky commented 2 years ago

Blocked for adopting the official API on https://github.com/dotnet/roslyn/issues/63318

joperezr commented 2 years ago

We do still intend to make this change for 7.0, even though it will likely miss RC1 given that we are currently blocked on that Roslyn bug getting fixed. Once it is, I will go ahead and make this change. Re-adjusting the milestone appropriately.

danmoseley commented 2 years ago

Roslyn bug was fixed over a week ago. Are we still waiting on the flow?

jkoritzinsky commented 2 years ago

If we update the generators to use the new Roslyn here, we need to update all consuming repositories as well.

I think the best solution will be to update the SDK in Arcade to use an SDK with the fix so we don’t need to manually upgrade the Roslyn version used in each repo.

danmoseley commented 2 years ago

I think the best solution will be to update the SDK in Arcade to use an SDK with the fix so we don’t need to manually upgrade the Roslyn version used in each repo.

Who owns doing that -- is that you @jkoritzinsky , @joperezr or someone else?

joperezr commented 2 years ago

I'm planning to take care of it this week.

CyrusNajmabadi commented 2 years ago

@joperezr Anything left to do with this item? Or can we close it out?

joperezr commented 2 years ago

PR isn't merged yet 😄. PR should be ready to go but I'm still waiting on making sure that all upstack repos have ingested an arcade update that will bump their compiler and SDK versions so that they'll be able to ingest the new source generators.

CyrusNajmabadi commented 2 years ago

@joperezr Anything left to do with this item? Or can we close it out?

joperezr commented 2 years ago

This can be closed now.

CyrusNajmabadi commented 2 years ago

Thanks!