dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.12k stars 4.04k forks source link

Allow for a way to ensure some source generators run before / after others #57239

Open pranavkm opened 3 years ago

pranavkm commented 3 years ago

We briefly discussed this during 6.0 / C# 10 but ultimately punted it since the ask came in too late. We've had issues where user-written source generators can no longer observe the output of source generated by Razor's:

For 6, the solution we gave our users is to turn off the source generator and fall back to our 2-phase compilation process. We'd made some progress discussing this internally, but filing this for C# 11 consideration.

bugproof commented 2 years ago

I really want to use this feature with Blazor but seems like I have to rely on AOP frameworks instead.

jcouv commented 2 years ago

Another instance of this problem which was reported is using RegexGenerator in Blazor project.

MarkFl12 commented 2 years ago

Yeah, I've been trying to build a source generator to help with Blazor projects, but I think what's happening is that it needs to run before the generator that transforms the .razor files so that when their generator runs the code I've generated that's used in them is available

RyoukoKonpaku commented 2 years ago

+1 for this.

We've been using generators for Blazor to generate a strongly typed class that contains all routes of the application so routing isn't stringly typed and much less prone to human errors. This relies on us having to us finding route attributes on the generated files by the razor compiler. Current workaround for us is to define the RouteAttribute in a codebehind so our source generator can see it, but ideally it'd be nice to keep using the @page directive for this.

Blackclaws commented 2 years ago

I think we might want to get some traction or at least ideas going on how this might be be achieved.

In general I see that there are at least two ways this could work.

  1. Re-run generators on the output of all other generators until a steady state is achieved

This would probably be the easiest way to implement this. However this already sounds like a really bad idea in case you build two generators that always trigger each other, leading to an infinite loop. This would however allow generators to use each other, with the possibility of terminating after a maximum amount of round trips.

  1. Actually order generators properly.

The way this could work is by having introducing for example an interface

public interface IOrderedGenerator
{
   IEnumerable<string> GetBeforeGenerators();
   IEnumerable<string> GetAfterGenerators();
}

With the returned strings understood as assembly names.

This would allow a generator to declare both before and after targets.

This method would also make it easy to detect loops, allow parallel execution of generators that aren't part of the same dependency graph, etc.

sanchez commented 2 years ago

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

Youssef1313 commented 2 years ago

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

This will prevent possible parallelization, I think

Suppose GeneratorA should run after Generator B, but GeneratorC doesn't have any dependency. It should be possible to start running both GeneratorA and GeneratorC in parallel.

sanchez commented 2 years ago

True. I hadn't considered that. You could group/nest sources thought right?

<Generators>
  <Step>
    <Source Type="GeneratorA" />
    <Source Type="GeneratorB" />
  </Step>
  <Step>
    <Source Type="GeneratorC" />
  </Step>
</Generators>

Each step being parallel within but being sequential adjacent. So A and B parallel together and C after

Edit: sorry realized I messed up your example a bit. Here A and B have no dependencies but C does for example

Blackclaws commented 2 years ago

Sorry bit new to the thread but wouldn't it be possible to just take the order of execution as a list, set in the running csproj? Could just be a bunch of namespaces etc. Then the responsibility is on the consumer and developer documentation.

Additionally different libraries could read in this list and "validate" their ordering and return warnings if it's not as expected.

What I don't like about my suggestion is that it's polluting the csproj more, but to me it makes sense; if I wanted to know/manage my source generators one of the first places I would look is the csproj

The problem I think is that this firmly puts the burden on the developer using the generator to know what it generates and how the dependencies between generators might play out.

I think the pit of success is rather that generators themselves define if they handle output from other generators or generate output that other generators (possibly dependencies) consume. A developer wouldn't really want to worry about these specific details and just expects to be able to use source generators in a way that they can build on one another without needing an extra step.

Also this would introduce another point where this needs to be validated etc.

CyrusNajmabadi commented 2 years ago

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

juarezasjunior commented 2 years ago

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated. What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez https://github.com/dotnet/roslyn/issues/57239#issuecomment-1238160592?

phil-scott-78 commented 2 years ago

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

Blackclaws commented 2 years ago

I'm not seeing any mechanism for this sort of system to be fast enough. Just having a single level of generators today, is enormously impactful for perf. Even for things that used to be fast (like creating a ref-assembly) we now have to fully bind and create the initial compilation so we can run generators. If we then allow generators to run before/after, and we expect those generators to understand the semantics of that prior generated code, I do not see how we could do that without just huge perf hits.

Performance is secondary kind of. This is a feature that might have a decent performance hit, but as long as its something that generators themselves have to implement in order for that to matter I think the problem isn't that big.

I liked the idea of the dev specify the order in .csproj. But, sometimes it could be boring, special if the dev is using a package that he doesn't know that it has code being generated. What about if we have both options? The default behavior is the same that we have now. The projects in .csproj will execute without order. But, if we want to execute it following an order, we may specify the order in .csproj like mentioned by @sanchez #57239 (comment)?

I think specifically that the default option being no order doesn't make a lot of sense. If I already know as the author of a generator that it will generate output that another generator has to process I want to be able to declare this in advance.

I'm not sure ordering would ever be possible. Take the Razor generator. There are a decent amount of generators that would need to look at the code in the razor files while also letting those razor files consume the generated code. I think that leaves multi-pass runs as the only option.

I suspect, maybe with hopeless optimism, that if this multi-pass behavior was restricted to incremental generators that the output could be examined rather quickly to determine if more runs are required, with a failsafe to bail if recursion is detected.

I think for razor files specifically this is a difficult situation. Looking at the generated code, I don't think there is a reason why Razor itself would have this sort of circular dependency on the output of the source generator. It might be there and since this is definitely one of the most visible use cases for chaining generators we definitely have to think about it though.

The razor source generator might itself be split into two parts though. The first part generates a file that just has all the attributes and declares a partial class, and the second part implements the actual rendering pipeline. That way you could hook into the middle of the razor generator.

CyrusNajmabadi commented 2 years ago

Performance needs to be primary. The perf here affects literally every semantic task at a higher layer.

Blackclaws commented 2 years ago

Performance needs to be primary. The perf here affects literally every semantic task at a higher layer.

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times. When there is a generator that needs this behavior its secondary how performant it is, as long as it isn't abysmal.

CyrusNajmabadi commented 2 years ago

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times.

It can't really be opt-in. If this becomes needed for some generator, then customers will have to enable this multi-pass generation. Which means that all performance will be impacted there. This is not a hypothetical. The perf hit of just a single pass of generators is enormous. This just exacerbates it, especially as it will likely become mandatory for some scenarios to work.

Blackclaws commented 2 years ago

Sure, I'm not saying its completely irrelevant, I'm saying that as long as the behavior is opt in, it shouldn't have any impact on current compile times.

It can't really be opt-in. If this becomes needed for some generator, then customers will have to enable this multi-pass generation. Which means that all performance will be impacted there. This is not a hypothetical. The perf hit of just a single pass of generators is enormous. This just exacerbates it, especially as it will likely become mandatory for some scenarios to work.

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence. The rest can continue to run in parallel.

This shouldn't impact anything but the affected generators themselves which might be problematic enough but it shouldn't affect existing generators at all when you can still run them in parallel.

CyrusNajmabadi commented 2 years ago

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence.

This is still problematic. Just consider the simple case of two generators, where one depends on the other. To actually produce semantics we ahve to do:

  1. Produce initial Compilation-A.
  2. Run generator 'G1' on it to generate Sources-A'
  3. Produce Compilation-B by combining Sources-A' to Compilation-A
  4. Run generator 'G2' on it to generate Sources-B'.
  5. Produce Compilation-C by combining Sources-B' to Compilation-B.
  6. Compile and emit Compilation-C

This is a ton of very expensive work three full compilations are made here.

Blackclaws commented 2 years ago

Ah, the multi pass scenario is what I consider a bad solution though. I'd much rather have a directed acyclic graph of generators so that only generators that actually depend on one another have to be run in sequence.

This is still problematic. Just consider the simple case of two generators, where one depends on the other. To actually produce semantics we ahve to do:

1. Produce initial `Compilation-A`.

2. Run generator 'G1' on it to generate `Sources-A'`

3. Produce `Compilation-B` by combining `Sources-A'` to `Compilation-A`

4. Run generator 'G2' on it to generate `Sources-B'`.

5. Produce `Compilation-C` by combining `Sources-B'` to `Compilation-B`.

6. Compile and emit `Compilation-C`

This is a ton of very expensive work three full compilations are made here.

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I'm wondering whether the compilation that is passed to a second generator really has to be a full compilation or whether there are ways to make this interface more lightweight.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators. This dependency feature is something I firmly believe shouldn't be added to the old V1 source generators.

CyrusNajmabadi commented 2 years ago

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I get teh value side. But the value is only acceptable if it doesn't come with even more negative value from teh perf hit there. As before, this is not speculative. We're still working on the perf problems introduced by generators (including 'incremental generators). We're not even close to back to where we want to be, and that's the current state of things. We can't layer on even more perf sinks when our perf is not even at an acceptable level today.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators.

Not really sure hwo that would work. But if you can come up with some mechanism to make that possible, that would be interesting.

Blackclaws commented 2 years ago

Right. I'm not saying I have to answer yet on how this can be achieved with good performance, just that it is a feature that's been requested again and again and that, with increasing proliferation of source generators, will only add more value.

I get teh value side. But the value is only acceptable if it doesn't come with even more negative value from teh perf hit there. As before, this is not speculative. We're still working on the perf problems introduced by generators (including 'incremental generators). We're not even close to back to where we want to be, and that's the current state of things. We can't layer on even more perf sinks when our perf is not even at an acceptable level today.

I think that depends on what you deem an acceptable level. I think we can agree however that we want this feature to be as performant as possible. Introducing this feature would introduce a performance hit on certain critical paths, but I don't really see a way to shorten that path either.

Also my hope would be that incremental generators also reduce the overall amount of calls that need to be made to the source generators.

Not really sure hwo that would work. But if you can come up with some mechanism to make that possible, that would be interesting.

There are a lot of things you can do with incremental generators that would minimize the impact of implementing this feature.

For example you wouldn't rerun dependent generators if their input didn't change, and there is nothing stopping you from including their output from the last compilation if their output won't change.

There's also the point that you could run all generators in parallel in the first iteration, then check if any of the generators would have their pipelines triggered and then selectively rerun those, discarding their previous output and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

phil-scott-78 commented 2 years ago

What if there was an additional interface that a generator could implement in addition to IIncrementalGenerator that would be given the syntax trees of the code that was added by other source generators for that pass. Given that info the author could then opt-in for additional runs. Ideally the new method would have the ability to cache the pipelines just like the incremental generator, and secondary runs would likewise still have the ability to cache their pipelines too.

Blackclaws commented 2 years ago

What if there was an additional interface that a generator could implement in addition to IIncrementalGenerator that would be given the syntax trees of the code that was added by other source generators for that pass. Given that info the author could then opt-in for additional runs. Ideally the new method would have the ability to cache the pipelines just like the incremental generator, and secondary runs would likewise still have the ability to cache their pipelines too.

Interesting idea, you'd run into the problem that you can't force one generator to run before another one though. So support for handling other generators would have to be built into new generators.

phil-scott-78 commented 2 years ago

Interesting idea, you'd run into the problem that you can't force one generator to run before another one though. So, support for handling other generators would have to be built into new generators.

I'm thinking the generators with the interface would eventually get the output from the other generators in subsequent runs. The method would just keep getting called until generators all agree they don't need to be rerun (or a failsafe tells them all to move it along).

Generators not implementing the method would be at the whims of Roslyn when they get run, which wouldn't be any worse than the current state.

CyrusNajmabadi commented 2 years ago

The method would just keep getting called until generators all agree they don't need to be rerun

I don't see any viable path forward on that. Just running things until a hopeful fixed-point is reached seems like a recipe for disaster (Even with any sort of cap). If things have circular dependencies, they need to be redesigned to not be that way.

phil-scott-78 commented 2 years ago

Kinda feels either the design allows two generators to work off each others outputs with a fail safe to keep people from being ridiculous, or a strictly ordered system with a design that the Razor generator would almost immediately throw a wrench in.

CyrusNajmabadi commented 2 years ago

and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

Any new compilation creates a linear additional cost on the entire pipeline. e.g. it's as if we're just recompiling the project 'one more time'.

Blackclaws commented 2 years ago

The method would just keep getting called until generators all agree they don't need to be rerun

I don't see any viable path forward on that. Just running things until a hopeful fixed-point is reached seems like a recipe for disaster (Even with any sort of cap). If things have circular dependencies, they need to be redesigned to not be that way.

I agree that circular dependencies are probably not going to work as they are evil (TM) by nature and at least for a first implementation should be avoided. I think running generators until they converge is doomed to fail even if its the naive implementation and probably much easier. I originally proposed it as one of the two solutions just so that it can be discussed, I honestly don't think its a good idea.

Kinda feels either the design allows two generators to work off each others outputs with a fail safe to keep people from being ridiculous, or a strictly ordered system with a design that the Razor generator would almost immediately throw a wrench in.

The razor generator doesn't have to throw a wrench in it necessarily. The only point where the razor generator (I think) needs access to generated code is when it wants to use generated components. So the only non-supported use case I can think of off the top of my hat is generating new components on the fly within .razor files. And even then you could order your generator to run before the razor generator just taking the .razor files directly as input.

and creating a new compilation. My gut feeling is that there are a lot of things you can do, but it won't be easy.

Any new compilation creates a linear additional cost on the entire pipeline. e.g. it's as if we're just recompiling the project 'one more time'.

Right. Worst case scenario is that we're going to have as many compilations as the dependency graphs longest path. This isn't something that can be helped directly. However my hope is that this is an initial penalty on first run and that subsequent runs would be able to shorten that path by making sure to only rerun generators whose inputs have changed, same as how incremental generators cache the results right now.

I think we all agree that we definitely will have to performance test any implementation to make sure that even a short chain of 2-3 generators will not triple compile times right off the bat and keep them tripled forever. Else that might indeed lead to a situation where working with chained generators becomes a chore.

hanxinimm commented 2 years ago

the other way, add Dependency Injection support

`

[Generator]
public class TestCodeSourceGenerator :  IIncrementalGenerator
{
    //the method will be priority operation in all generator.
    public void Install(IServiceCollection services)
    {

    }

    //when all Install finish, do it.
    public void Initialize(IncrementalGeneratorInitializationContext context)
    {
        context.RegisterSourceOutput(workshopTypes, GenerateCode!);
    }
}

[Generator]
public class _3rdPartyTestCodeSourceGenerator : IIncrementalGenerator
{
    //do register service
    //generator internal sort , remove, overwrite by DI
    //only register service
    public void Install(IServiceCollection services)
    {
        services.AddSingleton<BeforeRazorGenerator>();

    }

    public void Initialize(IncrementalGeneratorInitializationContext context) { }
}

`

veler commented 1 year ago

Hi there,

I'm hitting this issue too with a cross-platform project implying WASDK and some XAML. Basically hitting this more precisely: https://github.com/unoplatform/uno/issues/9865

Using Uno Platform, CommunityToolkit.MVVM and ReswPlus in the same project is pretty much impossible.

All these libraries I mentioned above have generators. Let's say I'm using something from CommunityToolkit.MVVM to generate the details of a property Foo in a ViewModel automatically and bind this property to the UI in the XAML. When doing so, I simply can't build my project because Uno Platform generators won't find Foo because it hasn't been generated by CommunityToolkit.Mvvm yet.

Workaround is to put all the XAML in a project A, all the ViewModels in a project B, all the RESW in a project C. But it unnecessarily complexify the solution architecture.

Now, by reading this thread, I do understand the big performance impact concern of sequentially running generators. I like the suggestion from @sanchez here

True. I hadn't considered that. You could group/nest sources thought right?

<Generators>
  <Step>
    <Source Type="GeneratorA" />
    <Source Type="GeneratorB" />
  </Step>
  <Step>
    <Source Type="GeneratorC" />
  </Step>
</Generators>

Each step being parallel within but being sequential adjacent. So A and B parallel together and C after

Edit: sorry realized I messed up your example a bit. Here A and B have no dependencies but C does for example

I wouldn't expect Roslyn to figure out itself which source generator depends on which other. I wouldn't expect source generator developers to set themselves what's their priority. I would instead expect to let the consumer of the source generator (me) defining in what order generator A B C should run. Documentation about this feature would have to clearly state that build performance may be drastically reduced depending on what we're doing. If it's clearly documented that slower build should be expected, I don't see why this wouldn't be an acceptable option.

b-straub commented 1 year ago

I think it is pretty bad to let such an important issue fizzle out. Whatever the problems that need to be overcome, there should be a solution. The .NET runtime is using more and more generators, e.g. to solve AOT and trimming problems, and such can now not be used within another generator. Good examples are JSON and REGEX generators.

elringus commented 1 year ago

Another +1 here. I'm maintaining a JS interop library which provides various helpful tools absent in stock .NET (eg, generating bindings based on C# interfaces, type script definitions, etc) and it worked fine prior to .NET 7 where the JS interop layer has been changed and is now using source generators itself. My library also uses source generators, so it can't be migrated to the new interop. And the old interop is going to be deprecated in .NET 8. A dead end.

Source generators are spreading fast across the entire .NET runtime and there will be more such issues with community solutions in various areas (and eventually across .NET's own codebase, I believe); please consider giving the issue more attention.

b-straub commented 1 year ago

I like to mention the things you can do with SWIFT macros and property wrappers… I feel the C# source generator approach is falling way behind when an elementary problem such as cascading generators is not solved.

Jinjinov commented 1 year ago

Can we expect this in .NET 8?

jcouv commented 1 year ago

Can we expect this in .NET 8?

No, this is not planned (Backlog milestone).

CyberAndrii commented 1 year ago

I made a helper method to run other source generators manually. In my case it's for [LibraryImport]. However, the code is not the best, and it relies on internals of Roslyn, so it may break in the future.

internal static class GeneratorRunner
{
    public static void Run(
        GeneratorExecutionContext context,
        string hintNamePrefix,
        IEnumerable<ISourceGenerator> generators,
        params SyntaxTree[] syntaxTrees)
    {
        var compilation = context.Compilation
            .RemoveAllSyntaxTrees()
            .AddSyntaxTrees(syntaxTrees);

        GeneratorDriver driver = CSharpGeneratorDriver.Create(generators: generators);

        driver = driver.RunGenerators(compilation);
        var runResult = driver.GetRunResult();

        foreach (var diagnostic in runResult.Diagnostics)
        {
            ReportDiagnostic(diagnostic);
        }

        foreach (var generatedSource in runResult.Results.SelectMany(result => result.GeneratedSources))
        {
            context.AddSource(GetHintName(generatedSource.HintName), generatedSource.SourceText);
        }

        void ReportDiagnostic(Diagnostic diagnostic)
        {
            // There will be an error if we report a diagnostic
            // from a different compilation so we create a new one.
            var newDiagnostic = Diagnostic.Create(
                diagnostic.Descriptor,
                Location.None,
                diagnostic.Severity,
                diagnostic.AdditionalLocations,
                diagnostic.Properties,
                // SimpleDiagnostic class and _messageArgs field are internal.
                // We use Krafs.Publicizer to access them.
                ((Diagnostic.SimpleDiagnostic)diagnostic)._messageArgs
            );

            context.ReportDiagnostic(newDiagnostic);
        }

        string GetHintName(string nestedHintName)
        {
            return hintNamePrefix switch
            {
                _ when hintNamePrefix.EndsWith(".g.cs") => hintNamePrefix[..^".g.cs".Length],
                _ when hintNamePrefix.EndsWith(".cs") => hintNamePrefix[..^".cs".Length],
                _ => hintNamePrefix,
            } + "__" + nestedHintName;
        }
    }
}

Usage example:

public void Execute(GeneratorExecutionContext context)
{
    var hintName = "Bindings.g.cs";

    var syntaxTree = SyntaxFactory.ParseSyntaxTree("""
internal static partial class Bindings
{
    [global::System.Runtime.InteropServices.LibraryImport("_Internal")]
    public static partial void Foo();
}
""",
        encoding: Encoding.UTF8
    );

    var libraryImportGeneratorType = Type.GetType(
        "Microsoft.Interop.LibraryImportGenerator, Microsoft.Interop.LibraryImportGenerator"
    )!;

    var libraryImportGenerator = ((IIncrementalGenerator)Activator.CreateInstance(libraryImportGeneratorType))
        .AsSourceGenerator();

    var generators = new[] { libraryImportGenerator };

    context.AddSource(hintName, syntaxTree.GetText());
    GeneratorRunner.Run(context, hintName, generators, syntaxTree);
}
b-straub commented 1 year ago

Can we expect this in .NET 8?

No, this is not planned (Backlog milestone).

I am quite amazed that when asked (Asp.Net team for AOT) even complex features like interceptors are implemented, but this topic, which is equally or even more important for AOT, remains in limbo for more than two years.

CyrusNajmabadi commented 1 year ago

@b-straub Interceptors are being investigated as they are likely to drive actual language changes, and we have to understand that space in order to move forward on it. SGs are purely in the compiler, and the ability to run before/after is well understood, but still lacks suitable solutions to the significant problems there.

b-straub commented 1 year ago

but still lacks suitable solutions to the significant problems there.

The complexity has been understood, what about the proposed and considered solution?

https://github.com/dotnet/roslyn/issues/68779#issuecomment-1634960714

CyrusNajmabadi commented 1 year ago

The complexity has been understood, what about the proposed and considered solution?

All generators are built off of an idea that they operate on a fully-representative versin of the world (called the 'Compilation'). In either SG v1 or SGv2, this means that in order to:

choose another SG and pass my generated files for one more transform.

we would need to create a full new compilation, containing teh existing code and the code you generated, to then run the other SG and get its outputs. Each of these transforms thus then adds a 'full Compilation' step which is enormously (and currently unacceptably) high cost.

As nothing has changed with the costs, or the acceptability of those costs, this is still not a viable solution :(

elringus commented 1 year ago

As an alternative, have you considered making all the generators used internally by .NET accessible as libraries? This way the users will be able to use them in their own generators, which (with a bit of additional hassle) will solve the issue as well.

The thing is, the more you utilize generators internally in .NET runtime, the less useful generators become for end users. You just keep breaking more and more extension points for the community. This is a really frustrating and worrying trend.

CyrusNajmabadi commented 1 year ago

You could certainly request that of the dotnet/runtime team :-)

biohazard999 commented 1 year ago

I agree with @Elringus I don't think we need a Model where all SG are like in a tree depending on each other, which adds complexity and cost. I think a layered or phase approach is more than enough.

  1. Roslyn built in
  2. MS one's (like razor, json etc)
  3. Community via nuget (they can't depend on others but the earlier ones
  4. User code

Another idea would be to mark the generator in which phase they want to run, but of course this can lead to 'wild west' in the community/nuget space. But that's the same peoblem as with analyzers. And I think the community did a good job making sure the perf overhead is as minimal as possible.

b-straub commented 1 year ago

@Elringus

Would you take on the task of opening an issue with your idea in the runtime repository? Your suggestion would solve my case and I am sure many others as well.

notour commented 1 year ago

are SG executed sequentially or in parrallel on the same code tree ?

If SG are executed sequentially each SG could have an attribute expressing dependency. This way the order could be defined compile time like library dependencies. This method allow more flexibility, order automatically through dependency tree, and error detection like cycling reference, ...

[GeneratorSequenceDependency("....", Order.After | Order.Before)]

If SG are in parallel it is more difficule and i understand the point of @CyrusNajmabadi

elringus commented 1 year ago

@Elringus

Would you take on the task of opening an issue with your idea in the runtime repository? Your suggestion would solve my case and I am sure many others as well.

I've actually already asked this via https://github.com/dotnet/runtime/discussions/87346#discussioncomment-6140606 for the new JS interop, and looks like they actually have plans on something like this, but they are long-term. I'm currently looking for alternative solutions, including switching out of C#/.NET entirely.

b-straub commented 1 year ago

You could certainly request that of the dotnet/runtime team :-)

I don't think that language, compiler and runtime live on different planets considering that .NET development is largely driven by MS teams. Similar to how other teams can request new features from the language/compiler, it should be possible the other way around.

Sure, as a developer I can make a request to the runtime, but that won't be the same as a coordinated internal effort to address the problem that more and more AOT enhancements (JSON, RegEx, JSInteropt,...) are inaccessible to SG usage. This is not a small problem. Either the whole SG concept is not well enough thought out, or the negative effects that are now coming to light are not addressed actively enough.

Blackclaws commented 1 year ago

The complexity has been understood, what about the proposed and considered solution?

All generators are built off of an idea that they operate on a fully-representative versin of the world (called the 'Compilation'). In either SG v1 or SGv2, this means that in order to:

choose another SG and pass my generated files for one more transform.

we would need to create a full new compilation, containing teh existing code and the code you generated, to then run the other SG and get its outputs. Each of these transforms thus then adds a 'full Compilation' step which is enormously (and currently unacceptably) high cost.

As nothing has changed with the costs, or the acceptability of those costs, this is still not a viable solution :(

@CyrusNajmabadi

I think as mentioned many times before the solution is to make dependencies explicit. That means that if one source generator has a dependency on another for processing their output it will run before that one. Obviously this means there can't be any loops and it needs to be a strictly acyclic graph.

Compilation would then require as many passes as the longest chain in that tree, which while unfortunate because it would increase compilation times is a fair trade off for the added utility in my opinion and apparently in the opinion of many others as well.

Is there a specific reason why a compilation can't/isn't amended by the new source files instead of having to be done from scratch? I don't understand where the extreme overhead is coming from here in the first place, but I haven't dug too deep into compiler internals.

I know that you can't just run the source generators that come after you only on your output (as a dependent SG) because some source generators need the whole compilation (see SGs for DI frameworks).

Have you done some actual exploration into how it would affect compile times? In the beginning it shouldn't affect any compile times as there is no defined dependency between any existing source generator so far and even once a few come out I don't expect there to be more than 2-3 passes needed in order to run the chain to its end. You can parallelize all SGs in the same stage after all.

CyrusNajmabadi commented 1 year ago

are SG executed sequentially or in parrallel on the same code tree ?

They all operate concurrently on the same input, producing independent outputs.

CyrusNajmabadi commented 1 year ago

I think as mentioned many times before the solution is to make dependencies explicit. That means that if one source generator has a dependency on another for processing their output it will run before that one. Obviously this means there can't be any loops and it needs to be a strictly acyclic graph.

This doesn't solve the primary concern that has been brought up since the beginning (and which i reiterated in my post). Doing that approach fundamentally means producing N full compilations for any operation. It's already problematic where N=2. Extending that model further isn't currently tenable.

Is there a specific reason why a compilation can't/isn't amended by the new source files instead of having to be done from scratch?

There is no such concept as 'amended by the new source file'. It's unclear what that would even mean or do. If you add new stuff, presumably the intent is for it to be used, so it needs to be visible by the actual rest of the compilation being compiled.

Furthermore, roslyn is a big, immutable, graph. You can't have a fully realized, immutable graph, then suddenly have more items in it.

Finally, a core principle of SGs is that if you took all the code generated, wrote it out to files, then recompiled those files with the original sources, that you would have the same output. If 'amended by the new source files' doesn't impact the rest of hte code, this principle is broken.

I don't expect there to be more than 2-3 passes needed in order to run the chain to its end.

Each pass is already too expensive. Even the single pass we have today is teh source of major problems. :-/