Understand and improve ConfigurationBinder Source Generator incremental behavior

ericstj commented 11 months ago

I would expect a well-behaved generator to only execute when code is changed that activates it. I would expect that generator to demonstrate significantly different performance characteristics when editing code which doesn't activate it vs code that does activate it (or otherwise influence its outputs).

This is not what I observe with the ConfigurationBinder source generator. It seems to be running and consuming ~ the same amount of CPU regardless of if I make edits to unrelated files vs files which should influence the generator output.

I tested it with https://github.com/chsienki/GeneratorTracer and a project that gives the generator a lot of work. I also added explicit events to measure the bulk of the work in the Parse and Emit phases and they agree with this: https://github.com/ericstj/runtime/commit/c10104cea90ebde28509642d67274d72e90d2867 And hacked @chsienki's tool to measure those: https://github.com/ericstj/GeneratorTracer/commit/c2e33bf713cccf13137bbe6114f645696de11654

After doing this I see lots of executions of the parse and emit phases even when making completely unrelated edits.

I'm not certain if what I'm observing here is normal and expected or not, but it does disagree with what I would expect from a functioning incremental build pipeline (only rerun when changes are needed).

I'd like to have a closer look at this with compiler folks and other generator authors to understand if what I see is normal or a problem. @eiriktsarpalis @CyrusNajmabadi @chsienki @captainsafia

ghost commented 11 months ago

Tagging subscribers to this area: @dotnet/area-extensions-configuration See info in area-owners.md if you want to be subscribed.

Issue Details

I would expect a well-behaved generator to only execute when code is changed that activates it. I would expect that generator to demonstrate significantly different performance characteristics when editing code which doesn't activate it vs code that does activate it (or otherwise influence its outputs). This is not what I observe with the ConfigurationBinder source generator. It seems to be running and consuming ~ the same amount of CPU regardless of if I make edits to unrelated files vs files which should influence the generator output. I tested it with https://github.com/chsienki/GeneratorTracer and a project that gives the generator a lot of work. I also added explicit events to measure the bulk of the work in the Parse and Emit phases and they agree with this: https://github.com/ericstj/runtime/commit/c10104cea90ebde28509642d67274d72e90d2867 And hacked @chsienki's tool to measure those: https://github.com/ericstj/GeneratorTracer/commit/c2e33bf713cccf13137bbe6114f645696de11654 After doing this I see lots of executions of the parse and emit phases even when making completely unrelated edits. I'm not certain if what I'm observing here is normal and expected or not, but it does disagree with what I would expect from a functioning incremental build pipeline (only rerun when changes are needed). I'd like to have a closer look at this with compiler folks and other generator authors to understand if what I see is normal or a problem. @eiriktsarpalis @CyrusNajmabadi @chsienki @captainsafia

Author:	ericstj
Assignees:	-
Labels:	`untriaged`, `area-Extensions-Configuration`, `source-generator`, `needs-area-label`
Milestone:	-

eiriktsarpalis commented 11 months ago

After doing this I see lots of executions of the parse and emit phases even when making completely unrelated edits.

Executions of the parse phase is expected in unrelated edits, given the current structure of the generator. What is more concerning is the emit phase, which suggests to me that there might be false negative equality comparisons in certain model instances that are not covered by unit testing.

FWIW I had experienced similar issues while working on STJ incremental values, and it took a bit of debugging until all equality-related bugs were squashed.

captainsafia commented 11 months ago

Executions of the parse phase is expected in unrelated edits, given the current structure of the generator.

Yes, this meshes with similar things I've seen when profiling RDG with the GeneratorTrace tool as well.

RDG's "parse" phase is particularly expensive because our equality checks need to analyze multiple parts of the invocation syntax (the signature of the delegate parameter, the name of the Map action call being invoked, the name of the parameters being passed to the invocation).

We can certainly resolve this by making our pipeline more incremental, but it comes at the cost of the generator implementation itself being more complex.

I think it would be worthwhile to explore the possibility of having generators that don't run on every edit in the future...

ericstj commented 11 months ago

I'm moving this issue out of 8.0 - I was able to measure simple usage of ConfigurationBinder and in that case the parse phase is the only phase running on unrelated edits.

I did find when testing our the ConfigurationBinder unit tests assembly (lots of Configuration usage) that we regress to running on keypress -- I need to narrow down what's causing that but given it's not in the simple case I don't think it's blocking 8.0.

This testing did reveal that both the Options and Logging generator run the entire pipeline on every edit. This is because they Combine with the compilation without any further filtering: https://github.com/dotnet/runtime/blob/e2c319a19c25b8c77f9d060077a4ea5f91a0214a/src/libraries/Microsoft.Extensions.Options/gen/Generator.cs#L27-L28 https://github.com/dotnet/runtime/blob/e2c319a19c25b8c77f9d060077a4ea5f91a0214a/src/libraries/Microsoft.Extensions.Logging.Abstractions/gen/LoggerMessageGenerator.Roslyn4.0.cs#L34-L35 So whenever the compilation changes, their code-generation step will rerun. They should instead be trying to capture only the relevant portions of the compilation and exposing that in a Select step to better reduce the work from every edit. This doesn't appear to be a regression from previous release for 6.0/7.0 - so it's OK to move that out (while still fixing it in 9.0 with a possibility of backport).

We should fix both of these generators and add tests which leverage tracking names to ensure that they remain well-beahved for incremental compilation.

think it would be worthwhile to explore the possibility of having generators that don't run on every edit in the future...

Agreed. I've raised this question quite a few times during development of the config source generator - what benefit do we really get by regenerating with every edit? Is it just diagnostics that might result from examining updates to the type graph? If so - maybe it's an acceptable tradeoff to defer work until build.

I chatted with @chsienki about adding the ETW events to our generators and it looks like he has something even better in the works in roslyn itself (hooray!) so these changes don't need to go in.

ericstj commented 1 month ago

We discussed improving the incremental characteristics of the runtime source generators and scoped it out of 9.0

dotnet / runtime

Understand and improve ConfigurationBinder Source Generator incremental behavior #92914