dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

Setting PackageLocation causes build to slow considerably #3990

Closed jachin84 closed 5 years ago

jachin84 commented 5 years ago

I'm not really sure where is the best place to ask this question. Feel free to close this issue if this is not the appropriate place.

I have a proj file with something like this:

<ItemGroup>
      <_WebProjectsToBuild Include="@(WebProjectsToBuild)">
        <TargetToBuild>Build</TargetToBuild>
        <AdditionalProperties>DeployOnBuild=true;
          DeployTarget=Package;
          WebPublishMethod=Package;
          PackageAsSingleFile=true;
          AutoParameterizationWebConfigConnectionStrings=false;
          GenerateSampleDeployScript=false;
          SkipInvalidConfigurations=true;
          IncludeSetACLProviderOnDestination=false;
          _PackageTempDir=$(OutputRoot)\%(Filename);
          PackageLocation=$(WebPublishDir)\%(Filename).zip;
          EnableUpdatePacakgePath=True</AdditionalProperties>
      </_WebProjectsToBuild>
    </ItemGroup>

    <MSBuild Projects="@(_WebProjectsToBuild)"
             BuildInParallel="true"
             ContinueOnError="false"
             Targets="%(TargetToBuild)"
             Properties="Configuration=$(Configuration);" />

To my surprise specifying _PackageTempDir or PackageLocation causes the build to run much slower. On my dev machine (which is very, very fast) it takes roughly 45 sec longer. On our build server, which is quite a bit slower, it takes almost 5 minutes longer. Surely specifying an alternative PackageLocation should not add such a delay? Any ideas?

rainersigwald commented 5 years ago

Can you capture diagnostic-level logs of both build scenarios and post the performance summary portions, especially Project Performance Summary?

jachin84 commented 5 years ago

Is there a private mechanism by which I can share the logs?

rainersigwald commented 5 years ago

Yes: file Visual Studio feedback and add attachments there. When I do so, I see this text:

Your attachments will be private to Microsoft. Your user name, title, and description will be public on Developer Community. For more on privacy, see the Microsoft Privacy Statement.

rainersigwald commented 5 years ago

(and then come back here and link to the feedback item so we don't have to wait for routing, please)

jachin84 commented 5 years ago

Done => https://developercommunity.visualstudio.com/content/problem/405703/moving-packagetempdir-cause-build-to-run-slower.html.

I added binlogs for both the slow and fast run.

rainersigwald commented 5 years ago

Thank you! From that log, I can see the problem: you're setting _PackageTempDir as a global property for only a subset of projects. Those projects have ProjectReferences that point to other projects, and when those references are followed, the _PackageTempDir global property is inherited. The MSBuild engine treats projects with distinct global properties as separate things (this is so you can for instance build Debug and Release of the same project simultaneously), which causes many/most of your projects to be built multiple times.

There are a few ways to avoid this:

  1. Set _PackageTempDir within individual projects, rather than as a global property for those projects.
  2. Set the _PackageTempDir property uniformly so that all projects have the same value for it.
  3. Do that by setting a global property on the initial build, for instance with msbuild.exe /p:_PackageTempDir=whatever
  4. Take care to avoid inheriting the global property you wish to set when following ProjectReferences, by setting the UndefineProperties metadata to _PackageTempDir _for all ProjectReference items in your tree (easiest via an ItemDefinitionGroup in a Directory.Build.props).

Also, side note: this build looks big enough that it would be worth spending some time making sure multiproc builds work for you; that could speed up builds significantly.

jachin84 commented 5 years ago

Thanks! I really appreciate your feedback. Let me see if I understand you correctly. By setting _PackageTempDir via the AdditionalProperties metadata those properties become global properties which are in turn inherited by any referenced projects.

I am building Web, Console, Service, Desktop and Test projects all in one go. Most of my project will at some point reference a Global project which I'll use as an example. If I build Web projects with a certain set of AdditionalProperties and then Console projects with a slightly different set of AdditionalProperties this will eventually lead to the Global project being built many times. I am correct in say this is referred to batching?

What I'm finding a little confusing is my Web projects always have a different set of global properties but it's only when the _PackageTempDir property is set that things start slowing down.

All projects that get built get a standard set of metadata like so:

<ItemDefinitionGroup>
    <ProjectsToBuild>
      <Configuration>$(Configuration)</Configuration>
      <Platform>$(Platform)</Platform>
      <TargetToBuild>Build</TargetToBuild>
    </ProjectsToBuild>
  </ItemDefinitionGroup>

Web projects are update with a different Target/AdditionalProperties.

<ItemGroup>
    <ProjectsToBuild Include="@(WebProjectsToBuild)">
      <TargetToBuild>PipelinePreDeployCopyAllFilesToOneFolder</TargetToBuild>
      <AdditionalProperties>AutoParameterizationWebConfigConnectionStrings=false;</AdditionalProperties>
    </ProjectsToBuild>
    <ProjectsToBuild Include="@(ConsoleProjectToBuild);@(ServicesToBuild)" />
  </ItemGroup>

In any case _PackageTempDir is only relevant for web projects so I suppose these is no harm setting it globally. I've also got a mechanism to add it only to the web projects via an extension point that will import ProjectName.target in my build folder automatically. I'll experiment with what works best.

What is UndefineProperties metadata? I can't find any documentation on it anywhere.

rainersigwald commented 5 years ago

If I build Web projects with a certain set of AdditionalProperties and then Console projects with a slightly different set of AdditionalProperties this will eventually lead to the Global project being built many times. I am correct in say this is referred to batching?

No, this is distinct from batching, which is when a single target or task runs multiple times because of distinct metadata.

This is the MSBuild engine's deduplication of projects to build, which isn't super well documented but is mentioned here a bit.

The idea is that a project should only be built once--just because you have references from app1.csproj and app2.csproj to utilities.csproj doesn't mean that utilities.dll should be built twice. But that rule is a bit too strict; it doesn't take into account various types of project configuration, like Configuration (Debug/Release), Platform (x86/x64/anyCPU), or TargetFramework (netstandard2.0/net472). On each of those, you might need to build the same project multiple times with different configurations. MSBuild's approach to this is to treat each (Project path, set of global properties) as distinct.

This approach works but causes the subtle problem you're seeing: if a project is built with distinct global properties but doesn't actually produce different output, it will be built more than once. In /m builds, this is a race condition that can fail builds; in single-proc builds it usually just wastes time.

What I'm finding a little confusing is my Web projects always have a different set of global properties but it's only when the _PackageTempDir property is set that things start slowing down.

I think this is because you're setting this new metadatum to something unique per WebProject with _PackageTempDir=$(OutputRoot)\%(Filename). That means that instead of one additional build for each project (because of the AutoParameterizationWebConfigConnectionStrings property), you now get N, starting in a tree from the N web projects.

Looking through the logs you shared, ShortBuild built a total of 827 projects, while LongBuild built 1658 projects. Both had 138 unique project paths.

What is UndefineProperties metadata? I can't find any documentation on it anywhere.

It is extremely underdocmented at the moment. I'm writing documentation but haven't finished. Current draft is at https://gist.github.com/rainersigwald/1fbd21a24a41b9b7c6204cb9cfcbb1cf.

jachin84 commented 5 years ago

Ah of course, I forgot about the %(Filename) at the end. That’s a really good explanation, thank you.

This also explain why I’ve not been able to get parallel builds to work consistently. Is there any way in the logs to easily identify the duplicate building as a result of deduplication see projects as distinct?

rainersigwald commented 5 years ago

Is there any way in the logs to easily identify the duplicate building as a result of deduplication see projects as distinct?

It's fairly difficult in the standard text logs. I have a logger that can help: https://github.com/rainersigwald/ParallelBuildDebuggingLogger

In its output, lines that say Reentering project are deduplicated, while lines that say Project {} built by project are new instances.

This logger is very useful to me, the person who wrote it. If you try it, I'd appreciate feedback; I know it's not intuitive to use and entirely undocumented.

jachin84 commented 5 years ago

Nice idea. I modified it a bit to produce a file I could drop into Graphziv. Any vertices with more than one edge are project being build twice. I think...

Very surprising how many projects are built a couple of times.

I’m starting to think that using a single msbuild task do build web/console/service/desktop projects just isn’t the way to go. There are too many properties that differ. I’m trying to think of a way to pass properties to the projects without making them global.

jachin84 commented 5 years ago

Ok so this seems a little odd. I am using the following in a Directory.Build.props which fails every time because of concurrency issues.

  <ItemDefinitionGroup>
    <ProjectReference>
      <UndefineProperties>
        _PackageTempDir;AutoParameterizationWebConfigConnectionStrings
      </UndefineProperties>
    </ProjectReference>
  </ItemDefinitionGroup>

If I do the following it work:

  <ItemDefinitionGroup>
    <ProjectReference>
      <UndefineProperties>_PackageTempDir;AutoParameterizationWebConfigConnectionStrings</UndefineProperties>
    </ProjectReference>
  </ItemDefinitionGroup>

Not the only difference is splitting UndefineProperties on different lines.

rainersigwald commented 5 years ago

Not the only difference is splitting UndefineProperties on different lines.

Nice find! I filed Microsoft/msbuild#4014 to track.

I’m trying to think of a way to pass properties to the projects without making them global.

Do you really need to pass them? The usual MSBuild approach to this sort of problem is either:

Of course if you opt for the latter you have to be super careful about global property inheritance again.

jachin84 commented 5 years ago

Thanks again for your help. Can you elaborate at all? I suspect there is a large gap in my understanding when it comes to building many projects at once.

Recompute them identically in each project

What do you mean by recompute? I am assuming you mean set the required properties in each project directly avoiding the need to use Properties/AdditionalProperties that results in global properties.

Compute them in a single project and ask for the results via MSBuild task in other projects.

Again not really sure what you mean here?

jachin84 commented 5 years ago

Feel free to close this issue. I'm still curious about the two questions above if you have a moment to provide more details?

rainersigwald commented 5 years ago

Recompute them identically in each project

What do you mean by recompute? I am assuming you mean set the required properties in each project directly avoiding the need to use Properties/AdditionalProperties that results in global properties.

Yes, that's what I mean. If you set the properties in a common include file (maybe a Directory.Build.props), you can use the same definition for every project without using any global properties.

Things that may help with this approach:

Compute them in a single project and ask for the results via MSBuild task in other projects.

Again not really sure what you mean here?

This is much more complicated, but I'm thinking something along the lines of

<Target Name="GetCentralProperties" BeforeTargets="BeforeBuild">
  <MSBuild Project="CentralComputationLocation.proj"
           Targets="ExpensiveCalculation">
    <Output TaskParameter="TargetOutputs"  
            PropertyName="CentrallyComputedProperty" />
  </MSBuild>
</Target>