dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.04k stars 4.03k forks source link

Tracking item for IDE experience changes around source generators. #67441

Open CyrusNajmabadi opened 1 year ago

CyrusNajmabadi commented 1 year ago

This item exists just to keep track of hte discussions behing had about SG integration in the IDE and what potential options there are for changing that to help alleviate the significant performance problems we have seen coming up in numerous scenarios, especially as more partners use generators in more and more extensive ways.

We have not comitted to any specific path. But we are monitoring and evaluating different potential options. Including, but not limited to:

  1. Doing the work to support RegisterImplementationSourceOutput at the IDE level. This would entail running those callsbacks the majority of the time for most features. But then would have to create a system that ensured that if features (like goto-def, etc.) wanted to navigate to the item, that the full source would be generated. This would also necessitate partners and the ecosystem then also adopt this method. it would not be helpful for partners that would need to constantly be generating non-impl outputs.
  2. Changing the model for generation from an implicit one (e.g. a feature just 'pulls' for a compilation, and gets it) to a more user-triggered one. i.e. "explicitly building", or "saving", or "switching tabs", etc. This would then push costs out to those scenarios, but at the potential cost of having stale information for other times. Might require thought in the UI as to how to make this clear to users. Or, it might just be acceptable given that this is similar to the model already employed by things like resx. Completed in https://github.com/dotnet/roslyn/pull/72494
Youssef1313 commented 1 year ago

Lack of support for RegisterImplementationSourceOutput is currently very problematic :'( I'd love to see it being implemented!

CyrusNajmabadi commented 1 year ago

@Youssef1313 can you give an example of when you would use this api?

Youssef1313 commented 1 year ago

@CyrusNajmabadi https://github.com/unoplatform/uno.extensions/blob/8518e11ce1a90c7b966ffca25f4a9bd7c46c50d1/src/Uno.Extensions.Core.Generators/PropertySelectorGenerator.cs#L45

In short, this generator only generates a module initializer, so the generated code isn't intended to be called by the developer, hence generating it on every key stroke is a performance killer.

Worse, by the design of this specific generator, it's expensive by nature. We need are calling GetSymbolInfo for every invocation. That is taking 44% of CPU time per a trace I captured. (https://github.com/unoplatform/uno.extensions/issues/1625)

If RegisterImplementationSourceOutput was implemented properly, all this wouldn't be problematic at all.

Also, at this point, we can't give up using a source generator.

I'm thinking of converting this generator to a regular ISourceGenerator and detect design-time builds and then do nothing, but I'm not sure about scenarios like HotReload if they will remain working properly, and not sure if the move to ISourceGenerator is a good idea.

CyrusNajmabadi commented 1 year ago

What goes into the module initializer?

Can you give an example of what code this looks for, and what it generates in response?

Youssef1313 commented 1 year ago

Here is a sample of the generated code

https://github.com/unoplatform/uno.extensions/blob/8518e11ce1a90c7b966ffca25f4a9bd7c46c50d1/src/Uno.Extensions.Core.Tests/PropertySelector/Given_PS0001.cs#L45-L87

CyrusNajmabadi commented 11 months ago

Have had further discussions with @jasonmalinowski on this topic and about the work we envision doing in the near term.

First, we considered, but ultimately rejected a model whereby Project instances would expose a GetCompilationAsync method and something morally akin to a GetCompilationsWithFullyCompleteGeneratorsAsync. The reason for this is that we felt it would be trivial to get into scenarios where a feature was working with Document/Projects and wanted one of the above. However, after getting the right compilation, they would then pass those same Document/Projects instances to other helpers which would then operate on a different compilation, thus leading to inconsistencies. Perhaps the feature used GetCompilationsWithFullyCompleteGeneratorsAsync but then called a helper with a Document instance that then called GetSemanticModelAsync (thus not seeing the results of generators), or vice versa. These inconsistencies would likely lead to painful and subtle bugs.

So, instead, we felt we should provide an api much more in line with the forking approach that frozen partial system uses today. There you start with some Solution snapshot, and you get a forked snapshot which has different behaviors, but which is entirely self consistent. The consistency point is critical as it allows everything to operate on that snapshot without having to know or care that this happened.

Breaking things down, we think we would make the following changes:

  1. A Workspace could opt into a world where its Solution snapshot did not run generators when producing Compilations. This would be opt-in, and many of our normal workspaces (MSBuildWorkspace, AdhocWorkspace, etc.) would not opt-in. That way they would still present a CurrentSolution instance that seemed fully complete to all existing consumers, who likely want those semantics in the domain they are running in. For example, a console app that wants to examine the CurrentSolution properly, likely wants to see that fully complete view. It's only rich hosts (like VS/VSCode) where we would have the more fine-grained behavior.
  2. For the richer workspaces (VS/VSCode) the normal Workspace.CurrentSolution would produce Compilations whose SourceGeneratorDocuments were whatever was previously generated when generators last ran to completion.
  3. Feature that would need fully-up-to-date compilations (like EnC) would then use a method like solution.WithFullyCompleteGenerators() which would return a solution-fork that would now give fully complete Compilations when asked.
  4. Critically we expect that this forked solution actually shares practically everything with the solution it was forked from, including:

    • All syntax trees
    • All references
    • The SolutionState/ProjectState/DocumentState instances
    • The exact same CompilationTracker instances.

    These last two bullets are very important. It means that when sg-complete-compilations are computed in this fork they can be stored in those CompilationTracker instances. The CompilationTrackerswould point at up to two compilations then. The "best effort compilation" and the "fully complete with generators" compilation. This also means that multiple features asking for the WithFullyCompleteGenerators fork would share the computation and caching for the fully generated compilations. So, while this was a 'fork', it would not be a throwaway one that would cause lots of redone, wasted computation.

    Because the fully-complete-Compilation would be computed in the same green nodes that the normal solution was pointing at, further normal mutations of the solution (like for document text changes) would continue to pass the data from those nodes forward. And, at the point where the fully-complete-Compilation was finished computing for a project, it's SourceGeneratorDocuments would then be available to the next normal-solution-snapshot when producing a normal compilation for one of its projects.

Now, alongside the above, we would also have a mechanism to possible auto-run generators based on certain triggers (save, build, etc.), as well as allowing them to be manually run by the user (likely with some UI affordance in the host). To see how this would work, let's imagine that we're at some point in time where SGs are out of date, having only been run in the past:

  1. The user wants up to date generator results. So, they click on the UI affordance to 'run all generators'.
  2. This would immediately fork the solution with WithFullyCompleteGenerators, and would start requesting fully-complete-compilations for all projects in that solution.
  3. The UI would indicate that this work was going on up to the point that all fully-complete-compilations had been computed (like with a spinner/progress bar).

Now, consider how this would manifest itself from a feature perspective. Say, 'completion'.

  1. After step '2' above, all project compilation trackers would have created their forked generator-drivers to produce the fully-complete-compilation. Those forked-generator-drivers would also now be part of the 'input generator' set to any further normal workspace solution updates.
  2. Until that forked-generator-driver completed work, we would still produce normal compilations using the last generated docs, so completion would still show stale info.
  3. Once that forked-generator-driver completed, then any future solution-snapshots will see the docs generated by it in the normal compilations, and features like completion would see the results.

Note: this does mean that a user can say they want to rerun generators, and still see stale results. We might want individual features to become aware of this and provide a slightly different experience. For example, a feature like 'find refs' might say "the user explicitly kicked off generator work. if they want to find references on a symbol, i will wait for all that work to complete prior to actually kicking off the find-refs work. that way the find ref results are up to date.". This would be likely what we want for features that can convey to users what they're doing so that users don't just think the feature is slow.

Note: This does reveal a slight problem with the above formalization. Specifically, that once fully-complete-generation is done only future solution snapshots from the workspace would see those results. For a feature like find-refs, it might not ever see that if no more edits happen. Once way this can be addressed is that find-refs would call WithFullyCompleteGenerators if it saw that the user had explicitly requested SGs to run. That way it would see a world with that information. The question would be when would be the right times for it to do this? And when should it stop and use the normal solution? We do not want a complex state machine here.

As such, i think we will actually want it to be the case that after this step:

  1. This would immediately fork the solution with WithFullyCompleteGenerators, and would start requesting fully-complete-compilations for all projects in that solution.

then that solution can be 'pushed back' into the workspace, making it so that any features that run afterwards simply pay the cost for a generator run for a project and see the results. The user asked for this to happen, so it's worthwhile for them to see it. Because this cost would only be paid rarely (when the user explicitly asks for it, or on rare verbs like 'build') this should hopefully be ok, and still much better than our current state when you always pay.

jasonmalinowski commented 11 months ago

@CyrusNajmabadi I think this does match what I think we all dicsussed. A few things to highlight:

And, at the point where the fully-complete-Compilation was finished computing for a project, it's SourceGeneratorDocuments would then be available to the next normal-solution-snapshot when producing a normal compilation for one of its projects.

This does mean that we're updating CurrentSolution asynchronously, and at times that aren't triggered by some external edit (like a text file edit or a project system operation.) This was a general concern with going down this model from day 1 of source generators (and one reason we went the way we did) since it made it a bit murky then when things happen like applying code fixes and we're trying to figure out what was a generator output and what is regular output. But the formalization here:

Critically we expect that this forked solution actually shares practically everything with the solution it was forked from, including:

Stemmed from us realizing that we can really consider these to be sharing (after a bit of refactoring) a SolutionState, which means it becomes clear that things like TryApplyChanges then operate on the SolutionState half of the Solution, with the other "half" then being this source generator state that's being carried along with updates to CurrentSolution.

The question would be when would be the right times for it to do this? And when should it stop and use the normal solution? We do not want a complex state machine here.

It occurred to me now there might be a bit (but emphasis on just "a bit") more complexity here. In my initial mind this would be as something as simple as an operation morally like Workspace.SetCurrentSolution(Workspace.CurrentSolution.WithFullyCompleteGenerators()). But there's then a fun question of what happens if a workspace edit happens right after that. If a user presses a key, does the new snapshot still have the "with full complete" bit set, or is it reset back to normal? What if the workspace edit isn't the user pressing a key, but some file being reloaded due to an unrelated background operation they didn't trigger?

Thinking back to our earlier formalization of Solution perhaps being more cleanly split into two halves, the first half is the existing ProjectStates being held by the SolutionState. And the other half is the compilation/generator state. In the case of workspace edits like text buffer edits or project edits, only the SolutionState side matters to the project system code -- the compilation/generator stuff can do whatever it wants. So if our project system code comes along and calls SetCurrentSolution passing in a new solution, for any of these "fancy" workspaces, it can take the SolutionState, but reuse the generator state (or update it, depending on whatever it wants.) So if the compilation/generator state has the "we want fully complete" bit set, that'll just get carried forward. And presumably once the generators are actually computed CurrentSolution is updated again with the final result of the generators, and the bit cleared. The SolutionState stays the same throughout that change.

jasonmalinowski commented 11 months ago

The one other thing here of course is "what about Razor?" since I know that's on the Razor team's mind. Right now we don't use the Razor generator for design-time operations, but we have strong desire to do so since right now we have endless problems where the Razor design time state is updated asynchronously and this causes race conditions. Mentally I'm thinking of the implementation of a method like WithFullyCompleteGenerators() isn't "just setting a flag" but something a bit more sophisticated. For example, the generator state for a Solution really has a list of generators that "must" be up to date before a call for GetCompilationAsync() to be done, and it just happens that in most times, that list is empty. (To be very clear I imagine the implementation might be different, just using that as a mental model.) But in this mental model WithFullyCompleteGenerators() just adds all generators to that list. But for something like Razor, we can pick a policy like one of these:

  1. It's just always in the list, which means we are blocking on it like today, but it's a very special generator from a team we interact regularly with and understands the performance ramifications of being on this magic list.
  2. We add it to the list any time there's an edit to a .cshtml file, it goes into that list. This could apply a similar for other generators clearly related to specific additional documents -- like say a .resx generator -- if we wanted to make this a more general mechansim.
  3. Our LSP system recognizes that for Razor related requests (i.e. requests in the direction of a .cshtml file) need it, and then can call some more limited WithSpecificGeneratorFullyComplete() that adds in the Razor generator to the list since we know by that moment we better run it or else the LSP request will potentially fail.
  4. Something else!

And I don't imagine this would necessarily require much specific knowledge of Razor at any deep layer, or at least not a layer already doing special magic for Razor. Which of these policies we'd have to go with we'd decide with the Razor team, since there's a lot of perf vs. correctness tradeoffs to consider. And of course something like option 3 presumes we're running in the same LSP server as them which is something we have generally agreed we want to do, but is still months off. So the policy may change over time.

CyrusNajmabadi commented 10 months ago

Ok. After another round of discussions with @jasonmalinowski we've broken things down into the following steps:

  1. First, we break up the existing SolutionState, pulling out all code relates to source-generators and compilation-trackers. SolutionState/ProjectState/DocumentState now are solely used to represent the "textual state" and structure of the world. This of course still includes "options" and "references" (which can be thought of as the reified in-memory values for the textual information in project files).
  2. Compilation-Trackers and SG state will either move to Solution itself, or perhaps a new type that owns that info.

The benefits of the above are that we decouple concepts taht can then update/change on their own cadences without updating the other. For example, in a world where SGs man run more rarely, we would want to be able to change them, without the 'solution structure' representation changing. This will also likely clean things up in our current impl where a lot of forking/transformations has to happen to disparate objects unnecessarily.

Next:

  1. We will change Workspaces to allow them to opt into stating that their 'SG related state' can 'slip back'. Basically, that they will not move SGs forward in lockstep with compilations.
  2. Most workspaces (like AdhocWorkspace/MSBuildWorkspace) will not opt into this. And, as such, will see the exact same behavior today. Specifically, anyone asking for compilations will see them fully up to date, with all accurate SG documents in it.
  3. Our VS/VSCode/Remote workspaces will opt into the behavior where their SG state can "slip back".

So what does this mean? Imagine the world today as have a representation for a project where:

  1. a project has documents
  2. a project can make it's "primordial" compilation using those documents.
  3. a project can be asked for its "full" compilation, where it takes its "primordial" compilation, and runs the generator pass (either full or incremental, it's not relevant), to get the generator docs.
  4. those generator docs are combined with the "primordial" compilation to make the "full" compilation.

In today's world, as a project 'forks' forward, it effectively always starts with the "i will have to run generators on my primordial compilation" to get my generated documents.

In the 'slip back' world we can tweak the above in a hopefully easy fashion. Specifically, when we fork a project, instead of starting it in a 'blank state' wrt to producing its SG documents, instead it is passed in that "computation" from the project it forks from.

In this world, we can literally use a 'null' Task/AsyncLazy to then represent "you were passed nothing from the prior state, and you should compute SG documents accurately against your current 'primordial' compilation". And non-null Task/AsyncLazy means "you get your SG documents from this computation, which comes from the past".

An important aspect of this is that if you then want to get up-to-date compilations, all you need to do is fork the solution forward, setting those "computation tasks" to null. This then puts that solution snapshot in the state where asking for any project's compilation will compute its up to date compilation. Note that this still can be incremental. A project can still use the generator-driver from any prior state, which will allow the incremental tables to still incrementally compute themselves against the current primordial compilation.

This ability to 'fork and clear SG state' would also be valuable for certain features which want up-to-date-SG semantics. All a feature has to do is get the solution snapshot and call this forking function. Their fork will now be completely up to date. Of course, performing that fork off of such a fork would be a no-op.

Next:

Once we have the above, we will opt VSWorkspace into using this 'slip back' model. When the initial solution is created, all of those "compute SG document" tasks will be null. So asking for compilations for any of the projects in the starting solution, or anything forked off from that, will get compilations with the initial set of generated docs.

However, from that point on, for normal solution changes (like user edits), those same SG documents will be passed along.

External to the workspace though there will be a new component that is then responsible for deciding when the Workspace should now recompute compilations. This space is still up in the air, but it will most likely at least decide to recompute compilations once a 'build' completes. It may also allow the user an explicit gesture to 'recompute' compilations.

With the above formalizations, this external service/component is now extremely trivial to write. All it does it take the existing .CurrentSolution of the workspace, and fork it. The fork is simple.

  1. We pass in the existing solution-state as is. After all, the "textual representation" of the world didn't change.
  2. We fork all compilation-trackign state for every project, returning it to the 'null/non-computed sg' state.

This is nice as the above is both easy and effectively free. It's really just forking and starting with a null value for a particular computation.

--

Overall, this seems to break up the work into a tiny set of very reasonable, very comprehensible changes. Importantly, it keeps the workspace model itself simple and clear.

CyrusNajmabadi commented 10 months ago

Additional refactoring identified as part of https://github.com/dotnet/roslyn/pull/71257

It would likely be good to have separate concepts of:

  1. SolutionState. Pure green tree effectively just representing the contents of documents, proejct-files and solution-files. https://github.com/dotnet/roslyn/pull/71257 accomplishes this.
  2. It should be possible to make a call to oop only using SolutionState for syncing. An example of such a call would be the actual call to OOP to generarate SG documents for a particular snapshot.
  3. Those generated docs along with the SolutionState should be exposed through some new construct (possibly internal only). Perhaps with a name like SolutionStateWithGeneratedDocuments. This would still be a green tree, but would include the documents for generated docs.
  4. We should then layer SolutionCompilationState on top of SolutionStateWithGeneratedDoucments. With the compilations it produces appropriately from that info (potentially usign stale info as per this entire tracking issue).

This split will help make it clear the separation of responsibilities, as well as what information can be accessed where.