Open CyrusNajmabadi opened 1 year ago
Lack of support for RegisterImplementationSourceOutput
is currently very problematic :'(
I'd love to see it being implemented!
@Youssef1313 can you give an example of when you would use this api?
In short, this generator only generates a module initializer, so the generated code isn't intended to be called by the developer, hence generating it on every key stroke is a performance killer.
Worse, by the design of this specific generator, it's expensive by nature. We need are calling GetSymbolInfo for every invocation. That is taking 44% of CPU time per a trace I captured. (https://github.com/unoplatform/uno.extensions/issues/1625)
If RegisterImplementationSourceOutput
was implemented properly, all this wouldn't be problematic at all.
Also, at this point, we can't give up using a source generator.
I'm thinking of converting this generator to a regular ISourceGenerator
and detect design-time builds and then do nothing, but I'm not sure about scenarios like HotReload if they will remain working properly, and not sure if the move to ISourceGenerator
is a good idea.
What goes into the module initializer?
Can you give an example of what code this looks for, and what it generates in response?
Have had further discussions with @jasonmalinowski on this topic and about the work we envision doing in the near term.
First, we considered, but ultimately rejected a model whereby Project
instances would expose a GetCompilationAsync
method and something morally akin to a GetCompilationsWithFullyCompleteGeneratorsAsync
. The reason for this is that we felt it would be trivial to get into scenarios where a feature was working with Document/Projects
and wanted one of the above. However, after getting the right compilation, they would then pass those same Document/Projects
instances to other helpers which would then operate on a different compilation, thus leading to inconsistencies. Perhaps the feature used GetCompilationsWithFullyCompleteGeneratorsAsync
but then called a helper with a Document
instance that then called GetSemanticModelAsync
(thus not seeing the results of generators), or vice versa. These inconsistencies would likely lead to painful and subtle bugs.
So, instead, we felt we should provide an api much more in line with the forking approach that frozen partial
system uses today. There you start with some Solution
snapshot, and you get a forked snapshot which has different behaviors, but which is entirely self consistent. The consistency point is critical as it allows everything to operate on that snapshot without having to know or care that this happened.
Breaking things down, we think we would make the following changes:
Workspace
could opt into a world where its Solution
snapshot did not run generators when producing Compilation
s. This would be opt-in, and many of our normal workspaces (MSBuildWorkspace
, AdhocWorkspace
, etc.) would not opt-in. That way they would still present a CurrentSolution
instance that seemed fully complete to all existing consumers, who likely want those semantics in the domain they are running in. For example, a console app that wants to examine the CurrentSolution
properly, likely wants to see that fully complete view. It's only rich hosts (like VS/VSCode) where we would have the more fine-grained behavior.Workspace.CurrentSolution
would produce Compilation
s whose SourceGeneratorDocument
s were whatever was previously generated when generators last ran to completion. Compilation
s when asked.Critically we expect that this forked solution actually shares practically everything with the solution it was forked from, including:
SolutionState
/ProjectState
/DocumentState
instancesCompilationTracker
instances.These last two bullets are very important. It means that when sg-complete-compilations are computed in this fork they can be stored in those CompilationTracker
instances. The CompilationTrackers
would point at up to two compilations then. The "best effort compilation" and the "fully complete with generators" compilation. This also means that multiple features asking for the WithFullyCompleteGenerators fork would share the computation and caching for the fully generated compilations. So, while this was a 'fork', it would not be a throwaway one that would cause lots of redone, wasted computation.
Because the fully-complete-Compilation
would be computed in the same green nodes that the normal solution was pointing at, further normal mutations of the solution (like for document text changes) would continue to pass the data from those nodes forward. And, at the point where the fully-complete-Compilation was finished computing for a project, it's SourceGeneratorDocument
s would then be available to the next normal-solution-snapshot when producing a normal compilation for one of its projects.
Now, alongside the above, we would also have a mechanism to possible auto-run generators based on certain triggers (save, build, etc.), as well as allowing them to be manually run by the user (likely with some UI affordance in the host). To see how this would work, let's imagine that we're at some point in time where SGs are out of date, having only been run in the past:
WithFullyCompleteGenerators
, and would start requesting fully-complete-compilations for all projects in that solution.Now, consider how this would manifest itself from a feature perspective. Say, 'completion'.
Note: this does mean that a user can say they want to rerun generators, and still see stale results. We might want individual features to become aware of this and provide a slightly different experience. For example, a feature like 'find refs' might say "the user explicitly kicked off generator work. if they want to find references on a symbol, i will wait for all that work to complete prior to actually kicking off the find-refs work. that way the find ref results are up to date.". This would be likely what we want for features that can convey to users what they're doing so that users don't just think the feature is slow.
Note: This does reveal a slight problem with the above formalization. Specifically, that once fully-complete-generation is done only future solution snapshots from the workspace would see those results. For a feature like find-refs, it might not ever see that if no more edits happen. Once way this can be addressed is that find-refs would call WithFullyCompleteGenerators
if it saw that the user had explicitly requested SGs to run. That way it would see a world with that information. The question would be when would be the right times for it to do this? And when should it stop and use the normal solution? We do not want a complex state machine here.
As such, i think we will actually want it to be the case that after this step:
- This would immediately fork the solution with
WithFullyCompleteGenerators
, and would start requesting fully-complete-compilations for all projects in that solution.
then that solution can be 'pushed back' into the workspace, making it so that any features that run afterwards simply pay the cost for a generator run for a project and see the results. The user asked for this to happen, so it's worthwhile for them to see it. Because this cost would only be paid rarely (when the user explicitly asks for it, or on rare verbs like 'build') this should hopefully be ok, and still much better than our current state when you always pay.
@CyrusNajmabadi I think this does match what I think we all dicsussed. A few things to highlight:
And, at the point where the fully-complete-Compilation was finished computing for a project, it's
SourceGeneratorDocuments
would then be available to the next normal-solution-snapshot when producing a normal compilation for one of its projects.
This does mean that we're updating CurrentSolution asynchronously, and at times that aren't triggered by some external edit (like a text file edit or a project system operation.) This was a general concern with going down this model from day 1 of source generators (and one reason we went the way we did) since it made it a bit murky then when things happen like applying code fixes and we're trying to figure out what was a generator output and what is regular output. But the formalization here:
Critically we expect that this forked solution actually shares practically everything with the solution it was forked from, including:
Stemmed from us realizing that we can really consider these to be sharing (after a bit of refactoring) a SolutionState, which means it becomes clear that things like TryApplyChanges then operate on the SolutionState half of the Solution, with the other "half" then being this source generator state that's being carried along with updates to CurrentSolution.
The question would be when would be the right times for it to do this? And when should it stop and use the normal solution? We do not want a complex state machine here.
It occurred to me now there might be a bit (but emphasis on just "a bit") more complexity here. In my initial mind this would be as something as simple as an operation morally like Workspace.SetCurrentSolution(Workspace.CurrentSolution.WithFullyCompleteGenerators()). But there's then a fun question of what happens if a workspace edit happens right after that. If a user presses a key, does the new snapshot still have the "with full complete" bit set, or is it reset back to normal? What if the workspace edit isn't the user pressing a key, but some file being reloaded due to an unrelated background operation they didn't trigger?
Thinking back to our earlier formalization of Solution perhaps being more cleanly split into two halves, the first half is the existing ProjectStates being held by the SolutionState. And the other half is the compilation/generator state. In the case of workspace edits like text buffer edits or project edits, only the SolutionState side matters to the project system code -- the compilation/generator stuff can do whatever it wants. So if our project system code comes along and calls SetCurrentSolution passing in a new solution, for any of these "fancy" workspaces, it can take the SolutionState, but reuse the generator state (or update it, depending on whatever it wants.) So if the compilation/generator state has the "we want fully complete" bit set, that'll just get carried forward. And presumably once the generators are actually computed CurrentSolution is updated again with the final result of the generators, and the bit cleared. The SolutionState stays the same throughout that change.
The one other thing here of course is "what about Razor?" since I know that's on the Razor team's mind. Right now we don't use the Razor generator for design-time operations, but we have strong desire to do so since right now we have endless problems where the Razor design time state is updated asynchronously and this causes race conditions. Mentally I'm thinking of the implementation of a method like WithFullyCompleteGenerators() isn't "just setting a flag" but something a bit more sophisticated. For example, the generator state for a Solution really has a list of generators that "must" be up to date before a call for GetCompilationAsync() to be done, and it just happens that in most times, that list is empty. (To be very clear I imagine the implementation might be different, just using that as a mental model.) But in this mental model WithFullyCompleteGenerators() just adds all generators to that list. But for something like Razor, we can pick a policy like one of these:
And I don't imagine this would necessarily require much specific knowledge of Razor at any deep layer, or at least not a layer already doing special magic for Razor. Which of these policies we'd have to go with we'd decide with the Razor team, since there's a lot of perf vs. correctness tradeoffs to consider. And of course something like option 3 presumes we're running in the same LSP server as them which is something we have generally agreed we want to do, but is still months off. So the policy may change over time.
Ok. After another round of discussions with @jasonmalinowski we've broken things down into the following steps:
The benefits of the above are that we decouple concepts taht can then update/change on their own cadences without updating the other. For example, in a world where SGs man run more rarely, we would want to be able to change them, without the 'solution structure' representation changing. This will also likely clean things up in our current impl where a lot of forking/transformations has to happen to disparate objects unnecessarily.
Next:
So what does this mean? Imagine the world today as have a representation for a project where:
In today's world, as a project 'forks' forward, it effectively always starts with the "i will have to run generators on my primordial compilation" to get my generated documents.
In the 'slip back' world we can tweak the above in a hopefully easy fashion. Specifically, when we fork a project, instead of starting it in a 'blank state' wrt to producing its SG documents, instead it is passed in that "computation" from the project it forks from.
In this world, we can literally use a 'null' Task/AsyncLazy to then represent "you were passed nothing from the prior state, and you should compute SG documents accurately against your current 'primordial' compilation". And non-null Task/AsyncLazy means "you get your SG documents from this computation, which comes from the past".
An important aspect of this is that if you then want to get up-to-date compilations, all you need to do is fork the solution forward, setting those "computation tasks" to null. This then puts that solution snapshot in the state where asking for any project's compilation will compute its up to date compilation. Note that this still can be incremental. A project can still use the generator-driver from any prior state, which will allow the incremental tables to still incrementally compute themselves against the current primordial compilation.
This ability to 'fork and clear SG state' would also be valuable for certain features which want up-to-date-SG semantics. All a feature has to do is get the solution snapshot and call this forking function. Their fork will now be completely up to date. Of course, performing that fork off of such a fork would be a no-op.
Next:
Once we have the above, we will opt VSWorkspace into using this 'slip back' model. When the initial solution is created, all of those "compute SG document" tasks will be null. So asking for compilations for any of the projects in the starting solution, or anything forked off from that, will get compilations with the initial set of generated docs.
However, from that point on, for normal solution changes (like user edits), those same SG documents will be passed along.
External to the workspace though there will be a new component that is then responsible for deciding when the Workspace should now recompute compilations. This space is still up in the air, but it will most likely at least decide to recompute compilations once a 'build' completes. It may also allow the user an explicit gesture to 'recompute' compilations.
With the above formalizations, this external service/component is now extremely trivial to write. All it does it take the existing .CurrentSolution of the workspace, and fork it. The fork is simple.
This is nice as the above is both easy and effectively free. It's really just forking and starting with a null value for a particular computation.
--
Overall, this seems to break up the work into a tiny set of very reasonable, very comprehensible changes. Importantly, it keeps the workspace model itself simple and clear.
Additional refactoring identified as part of https://github.com/dotnet/roslyn/pull/71257
It would likely be good to have separate concepts of:
This split will help make it clear the separation of responsibilities, as well as what information can be accessed where.
This item exists just to keep track of hte discussions behing had about SG integration in the IDE and what potential options there are for changing that to help alleviate the significant performance problems we have seen coming up in numerous scenarios, especially as more partners use generators in more and more extensive ways.
We have not comitted to any specific path. But we are monitoring and evaluating different potential options. Including, but not limited to:
Changing the model for generation from an implicit one (e.g. a feature just 'pulls' for a compilation, and gets it) to a more user-triggered one. i.e. "explicitly building", or "saving", or "switching tabs", etc. This would then push costs out to those scenarios, but at the potential cost of having stale information for other times. Might require thought in the UI as to how to make this clear to users. Or, it might just be acceptable given that this is similar to the model already employed by things like resx.Completed in https://github.com/dotnet/roslyn/pull/72494