Analyzer suggestion: Improper incremental pipeline

Youssef1313 commented 1 year ago

Describe the problem you are trying to solve

Incremental source generators are sometimes designed in a non-cache-friendly way. This happens when pipeline contains types that are not equatable across different compilations. These types can be ISymbols, or Compilations. Maybe also SyntaxNode? Not sure.

Describe suggestions on how to achieve the rule

The analyzer will look for incremental generator pipelines that are not cache-friendly, and issue a diagnostic.

More detailed, the analyzer will look for Register[Implementation]SourceOutput<T> invocations where T violates incrementality. Violation is defined as follows:

Type T doesn't implement IEquatable<T> (this includes Compilation)
Type T is ISymbol.
Type T isn't ImmutableArray<T>
Type T is a tuple where one of the tuple elements violates incrementality.
Type T is a user-defined model where one of the members violates incrementality.

Alternatively, and more strongly, we can detect any part of the pipeline that violates incrementality, not only Register[Implementation]SourceOutput. This should exclude the root context.CompilationProvider, etc.

Additional context

I have seen this a lot, and in the dotnet organization. e.g, https://github.com/dotnet/runtime/issues/76119 and https://github.com/dotnet/winforms/blob/117451b053c2fbda2f9caf9e1ef280f412b5aad1/src/System.Windows.Forms.Analyzers.CSharp/src/System/Windows/Forms/Generators/ApplicationConfigurationGenerator.cs#L74-L81

jkoritzinsky commented 1 year ago

I think checking every step is too strong of a constraint (some pipelines need to combine syntax w/symbol + compilation + analyzer options, etc), but having some mechanism to say "this step should be incremental" for a given intermediate step would be good. Maybe introduce an attribute that could be placed on a constant value? Then when that value is used through the constant in a call to WithTrackingName(), the analyzer checks the output for that step?

Youssef1313 commented 1 year ago

From previous discussions with @333fred, it's bad to have symbols as part of the whole pipeline (any step, if I understand correctly).

jkoritzinsky commented 1 year ago

It's only bad to have symbols if you provide a custom comparer that compares symbols from different compilations as equal. Without the custom comparer, they don't root the Compilation instance across multiple runs of the generator. With a custom comparer, it can cause the Compilation objects to be rooted for an indeterminate amount of time.

I think having an info-level diagnostic for symbols anywhere in the pipeline is okay but having it as a warning would cause a number of warnings in cases where it's currently impossible to fix (and handled at later steps that actually produce a cache-friendly state object).

333fred commented 1 year ago

@jkoritzinsky this is absolutely not true. You should not have symbols in the pipeline, as that can root previous compilations and prevent large swaths of memory from being freed.

CyrusNajmabadi commented 1 year ago

Yeah, symbols in any part of teh pipeline as values in the tables just won't work. They will always compare unequal, causing all entries to be recomputed, and all further steps to happen. So, in effect, you can have them in the tables... it just means you have no incrementality there. :)

Note: source symbols root compilations. And metadata symbols root metadata assemblies. The latter is not great, but the former is bad baddy bad bad :)

would cause a number of warnings in cases where it's currently impossible to fix

Can you give examples?

CyrusNajmabadi commented 1 year ago

and handled at later steps that actually produce a cache-friendly state object

@jkoritzinsky here's the thing though. That intermediary table you're generating ends up serving no purpose. It literally will be 100% dumped/recomputed on every run. So if you have a later step that produces an actual cache-friendly state object, you should just have that be the result of this stage instead of having an intermediary stage that does nothing :)

jkoritzinsky commented 1 year ago

That brings me to the same question I've asked multiple times and never gotten an answer for. How should I write a source generator such that I combine the information from an IMethodSymbol, a Compilation, and an AnalyzerConfigurationProvider where I have multiple intermediate steps where I need to calculate intermediate diagnostics?

This is the case in the LibraryImportGenerator, JSImportGenerator, JSExportGenerator, and VTableIndexStubGenerator (all of the interop generators).

I've run this specifically past both of you and you've both said that it's fine.

I understand that the intermediate tables are thrown away and I'm fine with that. They're literally only there because I can't get all of the information into a single step at once. I have a step that produced the cache-friendly object as soon as I have all of the necessary information available in a single step.

I can't have it all in the "select" function in ForAttributeWithMetadataName unless I recalculate the same compilation-wide information from the SemanticModel's Compilation object every time the generator sees a MethodDeclarationSyntax with the right attribute, and I can't get the symbol again later unless I re-create the SemanticModel, both of which are bad options and things we've been explicitly told not to do.

If there's a solution here that doesn't cause any problems, please tell me. I've asked literally every time this has come up and we go through this same conversation, whether on Discord or GitHub.

jkoritzinsky commented 1 year ago

Also, please explain to me how putting a symbol that points at source into a table that will be thrown away on the next execution (due to the fact that we're not using a custom comparer) is any worse for keeping Compilations alive than having a table for a node constructed with node.Combine(context.CompilationProvider) without a custom comparer? And if it isn't, why is it okay for ForAttributeWithMetadataName to use .Combine(context.CompilationProvider) and not for me to use an equivalent for a scenario that has no alternative today?

I want to follow best practices with source generators, but it's getting increasingly difficult when I'm told that what we're doing is "fine" one day and "not fine" the next, especially when the same person tells me it's fine again a few hours later only for it to come up again in a few months.

CyrusNajmabadi commented 1 year ago

That brings me to the same question I've asked multiple times and never gotten an answer for. How should I write a source generator such that I combine the information from an IMethodSymbol, a Compilation, and an AnalyzerConfigurationProvider where I have multiple intermediate steps where I need to calculate intermediate diagnostics?

First, teh answer may always be: there may be no way to accomplish what you want. Generators are not a fully flexible system for all purposes. They have limitations, and some things are def out of scope of them. In those cases, the best path forward is to make an official request for a suitable API (if one can even be built), have is scheduled into Roslyn's upcoming schedule, and then wait for it to be ready before writing the generator.

Now, wrt to the above. First, i'm not sure what AnalyzerConfigurationProvider. Can you give some links on that.

Second, you can certainly use IMethodSymbols and the Compilation. But you'd do so in the single step off the context.CompilationProvider that goes from that to hte model objects you want to use later down the pipeline.

This is the case in the LibraryImportGenerator, JSImportGenerator, JSExportGenerator, and VTableIndexStubGenerator (all of the interop generators).

Could you point to where you're doing this?

unless I recalculate the same compilation-wide information from the SemanticModel's Compilation object every time the generator sees a MethodDeclarationSyntax with the right attribute, and I can't get the symbol again later unless I re-create the SemanticModel, both of which are bad options and things we've been explicitly told not to do.

Yup. Seems problematic. Do you have to use generators? It sounds like we shoudl make an explicit feature request here on the generator system to provide something suitable for your needs. Have these generators shipped out to customers? If not, can we postpone them from doing so until we can get to a good place wrt to generators :) Thanks!

CyrusNajmabadi commented 1 year ago

I've asked literally every time this has come up and we go through this same conversation, whether on Discord or GitHub.

So tehre's a difference between convos on Discord (which are basically just the team putting in their spare time to try to help out), versus an official request for support from one team to another through our normal channels. If you have a team need for this, it can be asked for, with a specific timeline, and our leads can hammer out how to actually fit that into the schedule and what needs to be shuffled around to make it happen. Right now, anything else is just us monitoring the situation and going "Welp... no good solutions... hope that's not too much of an issue in practice". :)

I'm personally just tangentially interested here. Me and fred have no official responsibilities otherwise in the generator space. So we're advising, but leaving it to teh SG owners to actually make decisions/adjustments. And that is best served with official requests for specific functionality that is needed by partner teams to unblock scenarios.

jkoritzinsky commented 1 year ago

That brings me to the same question I've asked multiple times and never gotten an answer for. How should I write a source generator such that I combine the information from an IMethodSymbol, a Compilation, and an AnalyzerConfigurationProvider where I have multiple intermediate steps where I need to calculate intermediate diagnostics?

First, teh answer may always be: there may be no way to accomplish what you want. Generators are not a fully flexible system for all purposes. They have limitations, and some things are def out of scope of them. In those cases, the best path forward is to make an official request for a suitable API (if one can even be built), have is scheduled into Roslyn's upcoming schedule, and then wait for it to be ready before writing the generator.

I've been trying to work as closely as possible with the Roslyn team including @chsienki and @sharwell to make sure we're implementing the generators correctly in the .NET interop space (I drove the design and provided the implementation of the "incremental step tracking API"). The last time this came up (links to our conversation below), you specifically said that symbols can be used if they are not part of your model. As stated, they aren't part of our model, but they are passed between steps on the way to making our model.

https://discord.com/channels/143867839282020352/598678594750775301/1016794466952155190 https://discord.com/channels/143867839282020352/598678594750775301/1016794515761270815 https://discord.com/channels/143867839282020352/598678594750775301/1016794585445449798

When we spoke about introducing new APIs the last time, the only idea that came up was to provide new APIs that effectively combine ForAttributeWithMetadataName with each individual provider on IncrementalGeneratorInitializationContext and duplicate the API shape by 2x while still not actually solving the problem.

Now, wrt to the above. First, i'm not sure what AnalyzerConfigurationProvider. Can you give some links on that.

Sorry, I was referring to the AnalyzerConfigOptionsProvider. I should have pulled the link the first time to make sure I got the right name.

Second, you can certainly use IMethodSymbols and the Compilation. But you'd do so in the single step off the context.CompilationProvider that goes from that to hte model objects you want to use later down the pipeline.

How do I get from the Compilation to the symbol object from a SyntaxNode without passing an ISymbol-derived value from one step to another or re-creating the SemanticModel object? That's what I don't get. Are you saying that something like the following is fine even though it produces a node that produces IMethodSymbol and a node that produces (IMethodSymbol, Compilation)?

public void Initialize(IncrementalGeneratorInitializationContext context)
{
      var cachedNode = context.ForAttributeWithMetadataName("MyAttribute",
           static (node, ct) => node is MethodDeclarationSyntax,
           static (context, ct) => context.TargetSymbol is IMethodSymbol methodSymbol
               ? methodSymbol
               : null)
           .Where(
               static modelData => modelData is not null)
           .Combine(context.CompilationProvider)
           .Select((data, ct) => ProduceNonSymbol(data.Left, data.Right))
           .WithTrackingName("ContextWithNoSymbols");
}

private ContextWithNoSymbols ProduceNonSymbol(IMethodSymbol method, Compilation compliation) { /* ... */ }

private sealed record ContextWithNoSymbols {  /* ... */ }

This is the case in the LibraryImportGenerator, JSImportGenerator, JSExportGenerator, and VTableIndexStubGenerator (all of the interop generators).

Could you point to where you're doing this?

unless I recalculate the same compilation-wide information from the SemanticModel's Compilation object every time the generator sees a MethodDeclarationSyntax with the right attribute, and I can't get the symbol again later unless I re-create the SemanticModel, both of which are bad options and things we've been explicitly told not to do.

Yup. Seems problematic. Do you have to use generators? It sounds like we shoudl make an explicit feature request here on the generator system to provide something suitable for your needs. Have these generators shipped out to customers? If not, can we postpone them from doing so until we can get to a good place wrt to generators :) Thanks!

Yes we need to use generators as we make heavy usage of type system information to provide our marshalling model. The entire new direction of .NET interop is heavily built on the Roslyn source generation technology, and we've been involved with Roslyn team since the v1 API was first implemented and along every step of the way. LibraryImportGenerator, JSImportGenerator, and JSExportGenerator all shipped in .NET 7. VtableIndexStubGenerator is likely shipping in .NET 8 (along with the incoming ComInterfaceGenerator).

Here's a snippet of the LibraryImportGenerator:

https://github.com/dotnet/runtime/blob/c0ebf2bdcdcef90f82c67d357750c34bb56c3f4c/src/libraries/System.Runtime.InteropServices/gen/LibraryImportGenerator/LibraryImportGenerator.cs#L42-L101

In lines 50-97, we calculate all of the information for each of our intermediate steps. In the step defined on line 98, we produce our no-symbol model (there is a bug today that in one case we keep symbols, but that is in the process of being fixed in https://github.com/dotnet/runtime/pull/79051). We expect zero incrementality before the "CalculateStubInformation" step, and we have basic tests to validate incrementality of the "CalculateStubInformation" step using the step tracking API to ensure we don't regress (with the test suite expanding with more regression tests whenever we find an issue, including liveness tests to ensure we don't root additional Compilation objects on top of what the GeneratorDriver is already rooting).

It is untenable for us to grab all possible information we could ever need from the symbols as we'd effectively have to build our own copy of the symbol API to ensure we grab all the information we could possibly need.

I've asked literally every time this has come up and we go through this same conversation, whether on Discord or GitHub.

So tehre's a difference between convos on Discord (which are basically just the team putting in their spare time to try to help out), versus an official request for support from one team to another through our normal channels. If you have a team need for this, it can be asked for, with a specific timeline, and our leads can hammer out how to actually fit that into the schedule and what needs to be shuffled around to make it happen. Right now, anything else is just us monitoring the situation and going "Welp... no good solutions... hope that's not too much of an issue in practice". :)

I'm personally just tangentially interested here. Me and fred have no official responsibilities otherwise in the generator space. So we're advising, but leaving it to teh SG owners to actually make decisions/adjustments. And that is best served with official requests for specific functionality that is needed by partner teams to unblock scenarios.

I'm willing to work with the team to improve the space (as I've done before), but when people tell me things are fine and then come back and tell me "You should have made an official request to improve this even though we told you it was fine then but it's not and it never was" it makes it difficult to make sure we're doing the right thing. It also doesn't help that when I contribute fixes to the SG space in Roslyn, sometimes they're just ignored for months until someone on the Roslyn team stumbles on the same bug and fixes it in their own PR.

dotnet / roslyn-analyzers