dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.23k stars 1.35k forks source link

Consider logging ProjectStarted more sparingly #5316

Closed KirillOsenkov closed 3 years ago

KirillOsenkov commented 4 years ago

ProjectStartedEventArgs is usually very heavyweight, with GlobalProperties, Properties and Items.

For a 130 MB binlog you get 6844 events totalling ~0.5 GB, where the most massive event starts around 5 MB, with 29 GlobalProperties (including CurrentSolutionConfigurationContents of 40 KB), 8517 items (most with tons of metadata) and 1289 properties.

We see projects being called multiple times with the same properties and items. Maybe if those are immutable collections we should see if we've logged this exact snapshot before, and if yes, avoid logging it again? Feels like something along these lines is already in place, since some projects don't get properties and items logged already (and I don't understand when they are or are not logged).

Basically worth looking into. ProjectStarted is the second most heavy of all args types, topped only by BuildMessage totalling ~ 1 GB.

KirillOsenkov commented 4 years ago

From @mmitche:

here's a thought on something i noticed. In that huge log there are a TON of project started nodes, with a ton of total properties and itemgroups that get logged. But in quite a few cases, the target that is being invoked doesn't actually exist...

Should msbuild just be skipping the logging?
KirillOsenkov commented 4 years ago

See related: https://github.com/microsoft/msbuild/issues/3616

KirillOsenkov commented 4 years ago

I think I have an idea of how to deduplicate these.

  1. in the BinaryLogger, when we are about to write a list of properties, or a list of items, we first hash. If we saw the hash before, just write a stub with the hash. Otherwise add the hash to the global hashset and write out the full thing, including the hash.
  2. in the viewer, when we read a property list or item list, we see if it's a hash or a full thing. If it's the full thing, read it and associate the hash with this list, otherwise we must have seen it before so just retrieve from the cache.

We can do it for a list of properties and a list of items.

KirillOsenkov commented 4 years ago

@mmitche suggests it would be even better if we didn't send the duplicate properties and items across nodes at all. This would cut down on the chattiness for all loggers.

mmitche commented 4 years ago

My experiments indicate that only about 10% of the overall project invocations have unique input item + property sets. Overall these properties are gigantic (e.g. aspnetcore had about 3 gigs of data) so the wins could be significant.

KirillOsenkov commented 4 years ago

Avoid logging ProjectStarted items and properties if the target doesn't exist or is empty

KirillOsenkov commented 3 years ago

I think we should log properties and items at the ProjectEvaluationFinished, not at ProjectStarted. This way they will be logged only once if multiple projects start with the same evaluation. This will reduce the amount of data we log that has to flow through the logging system.

And we can make an escape hatch to only turn on the new mode for BinaryLogger.

KirillOsenkov commented 3 years ago

What I've learned is that properties and items on ProjectStartedEventArgs aren't sent across nodes at all. This explains why only some projects have properties and items - they were built on the in-proc node.

See here: https://github.com/dotnet/msbuild/blob/f98579d50f8b08c4efe13ca07327e3e5222b50f8/src/Framework/ProjectStartedEventArgs.cs#L303-L310

KirillOsenkov commented 3 years ago

LoggingService.SerializeAllProperties is interesting

KirillOsenkov commented 3 years ago

MSBUILDFORWARDALLPROPERTIESFROMCHILD if set to one will enable sending all properties from nodes to central node (but not items since there's no logic to serialize items in WriteToStream)

KirillOsenkov commented 3 years ago

_buildParameters.LogInitialPropertiesAndItems is equivalent

KirillOsenkov commented 3 years ago

LoggingNodeConfiguration.LogTaskInputs is a place where it feels appropriate to configure whether to send properties and items on ProjectEvaluationFinished instead of ProjectStarted

KirillOsenkov commented 3 years ago

Yes looks like LoggingNodeConfiguration is the proper way to tell the node what to do. Setting a Trait from BinaryLogger will only impact the central node.