Closed KirillOsenkov closed 3 years ago
From @mmitche:
here's a thought on something i noticed. In that huge log there are a TON of project started nodes, with a ton of total properties and itemgroups that get logged. But in quite a few cases, the target that is being invoked doesn't actually exist...
Should msbuild just be skipping the logging?
See related: https://github.com/microsoft/msbuild/issues/3616
I think I have an idea of how to deduplicate these.
We can do it for a list of properties and a list of items.
@mmitche suggests it would be even better if we didn't send the duplicate properties and items across nodes at all. This would cut down on the chattiness for all loggers.
My experiments indicate that only about 10% of the overall project invocations have unique input item + property sets. Overall these properties are gigantic (e.g. aspnetcore had about 3 gigs of data) so the wins could be significant.
Avoid logging ProjectStarted items and properties if the target doesn't exist or is empty
I think we should log properties and items at the ProjectEvaluationFinished, not at ProjectStarted. This way they will be logged only once if multiple projects start with the same evaluation. This will reduce the amount of data we log that has to flow through the logging system.
And we can make an escape hatch to only turn on the new mode for BinaryLogger.
What I've learned is that properties and items on ProjectStartedEventArgs aren't sent across nodes at all. This explains why only some projects have properties and items - they were built on the in-proc node.
LoggingService.SerializeAllProperties is interesting
MSBUILDFORWARDALLPROPERTIESFROMCHILD if set to one will enable sending all properties from nodes to central node (but not items since there's no logic to serialize items in WriteToStream)
_buildParameters.LogInitialPropertiesAndItems is equivalent
LoggingNodeConfiguration.LogTaskInputs is a place where it feels appropriate to configure whether to send properties and items on ProjectEvaluationFinished instead of ProjectStarted
Yes looks like LoggingNodeConfiguration is the proper way to tell the node what to do. Setting a Trait from BinaryLogger will only impact the central node.
ProjectStartedEventArgs is usually very heavyweight, with GlobalProperties, Properties and Items.
For a 130 MB binlog you get 6844 events totalling ~0.5 GB, where the most massive event starts around 5 MB, with 29 GlobalProperties (including CurrentSolutionConfigurationContents of 40 KB), 8517 items (most with tons of metadata) and 1289 properties.
We see projects being called multiple times with the same properties and items. Maybe if those are immutable collections we should see if we've logged this exact snapshot before, and if yes, avoid logging it again? Feels like something along these lines is already in place, since some projects don't get properties and items logged already (and I don't understand when they are or are not logged).
Basically worth looking into. ProjectStarted is the second most heavy of all args types, topped only by BuildMessage totalling ~ 1 GB.