dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.05k stars 4.04k forks source link

Source generators have exploded the costs of making skeleton references. #63891

Open CyrusNajmabadi opened 2 years ago

CyrusNajmabadi commented 2 years ago

Investigating perf issues has commonly shown that we are spending the vast majority of our time in skeleton reference generation:

image

This has become a major cost for us ever since the introduction of Source-Generators. Specifically, prior to source-generators, skeleton-generation just needed to do the following:

  1. bind the top-level-signatures of all types in the compilation.
  2. emit those to a 'ref assembly'.

This was comparatively cheap as there are often comparatively few of these total symbols in an assembly and little work (just basic name binding) needs to be done to accomplish this.

With source-generators this how now become:

  1. generate the initial compilation.
  2. run all source generators across this entire compilation. This binds all members across all files.
  3. use the sg run to produce the new compilation
  4. bind the top-level-signatures of all types in the new compilation.
  5. emit those to a 'ref assembly'.

What was previously just a walk of hte top-level, now takes the cost of a full-compilation. This is particularly exacerbated by every cross-language jump in any compilation chain. e.g. a C# project depending on a VB helper project depending on a C# project. This will need to intermediary source-generator compilation steps, which effectively means a full compile at each step in the chain.

The impact of SGs on ref-assembly generation (and how the IDE uses ref-assemblies to quickly and efficiently generate cross-language information) seems to have been missed. We need some solution that allows this generation to be quick again as we do not want it to be the case that every edit effectively costs us full compile expenses at each language-transition layer.

jaredpar commented 2 years ago

Have you all given any thought to how you're going to fix this problem?

Been thinking about this for a few days. Most of the generators that are out there don't actually contribute API surface area. Particularly as they begin to adopt file types they don't really participate in signatures at all (at least for the purpose of skeleton assemblies).

Could we leverage this in some way to help this process? For example could we have generators identify themselves as impl only and skip them for skeleton assemblies? Or given that is the more natural default we have generators opt into participating in skeleton assemblies by providing fast signatures only?

CyrusNajmabadi commented 2 years ago

as impl only and skip them for skeleton assemblies?

I like this idea. Because it means we may be able to avoid the generators entirely.

Or given that is the more natural default we have generators opt into participating in skeleton assemblies by providing fast signatures only?

This seems less viable. As we'll still have to run the entire pipeline just to see if they generated something.

--

Note: i think we would also need to do this in the reverse direction. Namely, instead of having generators opt-out of skeleton generation, have generators opt into them. Otherwise, we'll ahve the issue that you add a single generator that hasn't thought about this, and now we have to pay the cost of building the nascent compilation which is then needed for the skeleton generation.

Ideally, by making it so you have to opt-in, nearly all projects will just say "i don't have any relevant generators" and will fast path to just producing the final compilation. Only the handful that actually truly have a generator which says that it impacts skeletons will then pay that cost.

jaredpar commented 2 years ago

Note: i think we would also need to do this in the reverse direction. Namely, instead of having generators opt-out of skeleton generation, have generators opt into them.

Agree. That is what I was (poorly) trying to suggest we do. 😄

Only the handful that actually truly have a generator which says that it impacts skeletons will then pay that cost.

I'm wondering if we can make this faster or more cachable. Problem is I'm not really aware of enough that actually produce signatures that I can derive a pattern out of. A bonus of making it skeleton opt in is we can find and examine the generators that do want to do this. That should give us a better data set to start looking for a pattern

CyrusNajmabadi commented 2 years ago

a good example of one that has to do this are our 'Syntax' generators. These very much are producing our public surface area, so they'd need to run to be part of the skeletons so that we can actually see these types downstream :)

jaredpar commented 2 years ago

That example is very cachable as it comes purely from additional files. I would expect that contributes really nothing for skeleton assemblies. Yes first time we need to run it but after it's fully cached. Is that mental model correct?

CyrusNajmabadi commented 2 years ago

That example is very cachable as it comes purely from additional files. I would expect that contributes really nothing for skeleton assemblies. Yes first time we need to run it but after it's fully cached. Is that mental model correct?

Yes. i believe so.