Open rainersigwald opened 7 years ago
One thing of note: MSBuild only calls that AssemblyName
constructor because AssemblyName.Clone()
isn't available in .NET Standard 1.x. One possible improvement here would be to move to .NET Core 2.0.
To add a little more detail we really have a project that's taking this long to build (referring to the 95sec spent in UnifyAssemblyNameVersions
).
You may want to check out https://github.com/aspnet/Mvc/tree/dev/test/Microsoft.AspNetCore.Mvc.FunctionalTests in general as example of a really complicated dotnet core package that's updated to the bleeding edge.
There's an ask from the ASP.NET team to not even run ResolveAssemblyReference in some cases. See https://github.com/dotnet/sdk/issues/1193
I think you should look at a couple options:
I think #2 is doable if we look at the right set of inputs/outputs and potentially change AssemblySearchPaths for some project types.
There is a cache in RAR but we can't use it in Core because it depends on BinarySerialization which isn't accessible to us.
I'm fairly hesitant to build any assumptions about nuget immutability into RAR.
Isn't binary serialization available in netcoreapp2.0
now, as part of netstandard2.0
?
@DamianEdwards Yes, but we're not targeting that. We'd like to but it's a lot of work and we can't do it for 15.3.
@rainersigwald can you not light-up on it in the same way you did for the change above? Or is not really factored to enable that easily right now?
@DamianEdwards I don't think so, because binary serialization requires source attributes, which can't be discovered via reflection.
What other serialization options do we have?
I'm going to spend tomorrow on some prototyping and profiling on this.
@rainersigwald thanks
@DamianEdwards what's the best scenario to hammer on? A dotnet new web
project with sdk 2, and incremental dotnet build
s? On my machine RAR doesn't seem like a significant part of that (1 ms of the 1 s build) while NuGet operations take the time:
9 ms CheckForImplicitPackageReferenceOverrides 1 calls
20 ms ProduceContentAssets 1 calls
99 ms GenerateRuntimeConfigurationFiles 1 calls
149 ms ResolvePackageFileConflicts 1 calls
239 ms ResolvePackageDependencies 1 calls
@rainersigwald try dotnet new mvc
. Here's the result on my machine (all from the CLI using dotnet build
):
9 ms CoreGenerateAssemblyInfo 1 calls
32 ms CheckForImplicitPackageReferenceOverrides 1 call
32 ms _CopyOutOfDateSourceItemsToOutputDirectory 1 cal
33 ms RunProduceContentAssets 1 calls
47 ms _ComputeLockFileReferences 1 calls
67 ms GenerateBuildRuntimeConfigurationFiles 1 calls
81 ms ResolveLockFileReferences 1 calls
108 ms _ComputeLockFileCopyLocal 1 calls
123 ms GenerateBuildDependencyFile 1 calls
172 ms RunResolvePackageDependencies 1 calls
246 ms _HandlePackageFileConflicts 1 calls
905 ms ResolveAssemblyReferences 1 calls
1388 ms CoreCompile 1 calls
8 ms CoreCompile 1 calls
11 ms CheckForImplicitPackageReferenceOverrides 1 calls
33 ms RunProduceContentAssets 1 calls
52 ms _ComputeLockFileReferences 1 calls
69 ms GenerateBuildRuntimeConfigurationFiles 1 calls
80 ms ResolveLockFileReferences 1 calls
108 ms _ComputeLockFileCopyLocal 1 calls
147 ms _HandlePackageFileConflicts 1 calls
186 ms RunResolvePackageDependencies 1 calls
960 ms ResolveAssemblyReferences 1 calls
8 ms CoreCompile 1 calls
12 ms CheckForImplicitPackageReferenceOverrides 1 calls
34 ms RunProduceContentAssets 1 calls
46 ms _ComputeLockFileReferences 1 calls
66 ms GenerateBuildRuntimeConfigurationFiles 1 calls
79 ms ResolveLockFileReferences 1 calls
119 ms _ComputeLockFileCopyLocal 1 calls
140 ms _HandlePackageFileConflicts 1 calls
182 ms RunResolvePackageDependencies 1 calls
950 ms ResolveAssemblyReferences 1 calls
Or use this project if you want a really fun one 😆
https://github.com/aspnet/Mvc/tree/dev/test/Microsoft.AspNetCore.Mvc.FunctionalTests
Here's what I've figured out so far:
dotnet new mvc
project passes 307 assemblies to RAR. That's a lot.FindDependencies=false
reduces overall time in RAR by ~half.AssemblyName.Clone()
via reflection (even after #2016)There are a couple of options to shorten this:
I spoke to a source who was involved with RAR in the past (you can't escape if you're still in DevDiv, but I can keep your name out of the press!) who thought that the best option might be to teach the NuGet-resolution targets to emit directly to the items that are RAR outputs (like @(ReferencePath)
), instead of resolving from NuGet assets file -> items pointing to paths on disk -> RAR -> different set of items pointing to paths on disk.
RAR does basically three things, none of which are super interesting for NuGet references:
ResolvePackageFileConflicts
in the SDK.We can't eliminate RAR altogether, because when targeting a full framework TFM you can use references that don't come from NuGet (like <Reference Include="System.Configuration" />
or direct reference to a path on disk). But we could filter the list to elide packages that come from NuGet.
I prototyped this with an injected task:
<Target Name="AdjustRAR" BeforeTargets="ResolveAssemblyReferences" Condition="'$(AdjustRAR)' != 'false'">
<ItemGroup>
<ReferencePath Include="@(Reference)" Condition="'%(NuGetSourceType)' == 'Package'" />
<Reference Remove="@(Reference)" Condition="'%(NuGetSourceType)' == 'Package'" />
</ItemGroup>
</Target>
For an MVC template project, that completely eliminates RAR:
9 ms CoreCompile 1 calls
10 ms AdjustRAR 1 calls
10 ms _CheckForUnsupportedTargetFramework 1 calls
18 ms CheckForImplicitPackageReferenceOverrides 1 calls
45 ms FindReferenceAssembliesForReferences 1 calls
45 ms RunProduceContentAssets 1 calls
65 ms _ComputeLockFileReferences 1 calls
85 ms GenerateBuildRuntimeConfigurationFiles 1 calls
128 ms ResolveLockFileReferences 1 calls
155 ms _ComputeLockFileCopyLocal 1 calls
203 ms _HandlePackageFileConflicts 1 calls
267 ms RunResolvePackageDependencies 1 calls
Ok, so great. What are the risks here? We need to sync with the SDK and NuGet teams about:
@rainersigwald great info. That number of assemblies is much higher than expected (by about 67%). Could you share the list?
Also, you say:
We can't eliminate RAR altogether, because when targeting a full framework TFM you can use references that don't come from NuGet (like
or direct reference to a path on disk). But we could filter the list to elide packages that come from NuGet.
Could we eliminate it in the cases the project is not targeting a full framework TFM then?
Could we eliminate it in the cases the project is not targeting a full framework TFM then?
What I'm hoping is that we can run it only for references that don't come in via NuGet. In the common non-full-framework-targeting case, then, it won't run at all--that's what I see with my target from above.
You need to run it for full framework on the nuget assemblies as well. It'll find version conflicts and emit bindingRedirects.
@DamianEdwards
Could you share the list?
@rainersigwald ahh that makes sense. The ASP.NET Core assemblies account for ~180 of those, the rest are from Microsoft.NETCore.App
and NETStandard.Library
You need to run it for full framework on the nuget assemblies as well. It'll find version conflicts and emit bindingRedirects.
I thought version conflicts for NuGet-delivered assemblies were handled by ResolvePackageFileConflicts
. @dsplaisted am I misunderstanding the intention there?
I'm not talking about conflicts of actual references. I'm talking about transitive references. Consider that library A (1.0.0.0) > B (1.0.0.0), and app references A (1.0.0.0) and B (2.0.0.0). RAR will only see A (1.0.0.0) and B (2.0.0.0) but it needs to discover that a bindingRedirect is needed for B. It does that and emits and suggested redirect which gets written to the app.config.
ResolvePackageFileConflicts does not examine transitive references, it's only about handling conflicts in direct references because RAR doesn't touch those.
On desktop the loader is strict and requires you to tell it to redirect a lower version to a higher version.
On .NET Core the loader will automatically load a higher version.
I guess by bypassing RAR we do lose out on the case where someone has explicitly chosen a lower version. Consider the case where A (1.0.0.0) > B (2.0.0.0), but the app referenced B (1.0.0.0). This would cause a ref-def mismatch at runtime even on .NET core. RAR catches this today and we'd lose that by bypassing RAR. Not sure we care because typically this means a package bug, or someone has explicitly downgraded a package (which NuGet should emit a warning for), but that's something to consider.
@ericstj Isn't that a malformed NuGet scenario? The A-1.0 package should have a reference to B-1.0, which is then resolved by NuGet's logic?
I don't understand what the design intention there is for binding redirects--that would be handled better by RAR. But is that what actually happens in a project today?
@ericstj Isn't that a malformed NuGet scenario? The A-1.0 package should have a reference to B-1.0, which is then resolved by NuGet's logic?
Sure, but then the app references B 2.0 because it wants to use that. Nuget does the right thing and gives the app B 2.0. It's up to RAR to write the redirect. It does this today. See https://github.com/Microsoft/msbuild/blob/6851538897f5d7b08024a6d8435bc44be5869e53/src/Tasks/AssemblyDependency/ResolveAssemblyReference.cs#L853 https://github.com/Microsoft/msbuild/blob/6851538897f5d7b08024a6d8435bc44be5869e53/src/Tasks/AssemblyDependency/GenerateBindingRedirects.cs
Feel free to take this part of the discussion offline. Net result is that RAR is required on desktop for all references. I think for other frameworks there might be some lee-way but we would lose some of the build time checks and push those off to runtime. Might be acceptable.
Ah, I see now--NuGet coughs up a (package-conflict-resolved) A-1.0 and B-2.0 assemblies, which are then fed into RAR, which explores the transitive assembly dependencies of A, finds B-1.0, adds that to the closure of assemblies, and then applies its own assembly-conflict resolution logic to choose B-2.0 and recommend a binding redirect.
That's a great case, thanks.
I think that means we're back to "attempt to cache outputs for unchanged inputs". This hasn't been done historically, because the "inputs" to RAR include things like the GAC, registry keys, and arbitrary filesystem locations--so it's effectively impossible to build the cache key.
But for SDK projects, the really problematic inputs of GAC and registry are already disabled. I think I can live with requiring an explicit user-requested rebuild after installing a .NET SDK (that could change where a reference to System
came from). If anyone thinks that would be unacceptable, or sees another machine-state change that would cause problems, please holler.
I'd expect new caching behavior to be put behind an opt-in parameter to RAR, so non-new-SDK projects see no difference, and the SDK can opt in (and if we discover a late-breaking bug, a user could opt back out again).
@rainersigwald do you have a plan on how/when this would make its way in?
I'm planning to prototype the caching strategy ASAP. Will try to have a more concrete idea of timeline by EOD Tuesday, which we can use to plan for release windows.
@rainersigwald many thanks! Do we have a rough idea of what would be cached and thus how much time we're looking at saving?
I think that means we're back to "attempt to cache outputs for unchanged inputs".
Skipping RAR for references that already have a path assigned seemed promising for .NET Core projects. Would it be worth doing that unless the project targets .NET Framework, while on .NET Framework everything would still need to go through RAR in order to generate binding redirects?
If it achieves a good gain then I think we should.
Any update on this?
So my understanding is there are three interesting cases here:
Skipping RAR is extremely fast and appears to work in the first case, but it does produce some differences. For example, RAR produces @(ReferenceSatellitePaths)
which is passed to GenerateDepsFile
--which is presumably important.
@dsplaisted do you have a feel for whether that torpedos the skip-it-entirely-for-Core plan? I suspect it does but not with great confidence.
Just had a phone call with @DamianEdwards @AndyGerlicher @nguerrera @dsplaisted @davidfowl and @livarcocc. Please add anything I don't note below.
tl;dr: we should explore the general caching plan from https://github.com/Microsoft/msbuild/issues/2015#issuecomment-303875723
140 of the references in the template MVC project I've been playing with come from the microsoft.netcore.app
package. Several of us were surprised by this since we hoped that would reduce to just netstandard.dll
(or at least much less).
We decided that just skipping RAR entirely wasn't a good idea because of the need to find associated files and concern about unknown consequences.
We thought about caching dependencies at a per-assembly granularity, but are concerned that it wouldn't actually be faster than reading them from a file.
There's existing code that attempts to stop walking dependencies when it encounters Framework assemblies, but it's not in use with NuGet-delivered references. We could potentially annotate (some?) references and trim the exploration from there. But that might not help, since NuGet gives RAR a flat list of dependencies, including all of the "framework" ones.
The primary scenario at the moment is an MVC project with .cs code churn (and static references). Invalidating the cache if a ProjectReference rebuilt seems acceptable, especially since reference assemblies should reduce churn on references in the not-too-distant future.
The caching would be (pseudocode):
if ( hash(inputList) == cache.inputHash ):
foreach previousOutput in cache.previousOutputs:
if ( lastWriteTime(previousOutput) >= cache.previousTimestamp ):
return fullRAR(inputList)
return cache.previousOutputs
return fullRAR(inputList)
But that might not help, since NuGet gives RAR a flat list of dependencies, including all of the "framework" ones.
We could just as well deliver these via targets that nuget installs. That's similar to what we're doing elsewhere and the project-system is doing work to support that. Just to let you know that is on the table, and could be considered as additional work if it would help the first-build case.
For awareness: in #2215 I backed out the read-assemblies-only-once part of #2192, because on full framework it caused locks to be held on files after returning from RAR. We plan to try to get a fixed version back in in the 15.3 timeframe.
Hey,
So I'm hitting the exact same problem with our compilation pipeline at Unity. We are typically providing our own .NET framework facades, along all the assemblies of the engine...etc., so that's roughly 90 assembly references in each csproj.
With around 20 projects in a solution, and if nothing changed, each project having exactly the same set of references, the build will still take 6 seconds to complete (!). That's huge (for "nothing changed")
The main reason is coming from ResolveAssemblyReference
that is taking from 200ms to 250ms per project to resolve the assemblies.
Worse is that all these assembly references have <Private>False</Private>
so not sure what is the point of having ResolveAssemblyReference running over them...
Do you have any suggestion on how to workaround this? (I tried the trick "AdjustRAR" above by @rainersigwald but it doesn't compile after that, maybe something has changed in the target files since then...)
When you say you are providing your own assemblies, are you generating a bunch of reference items, acquiring them from nuget package assets, or something else?
If you are generating reference items, can you try adding ExternallyResolved=true metadata to them?
When you say you are providing your own assemblies, are you generating a bunch of reference items, acquiring them from nuget packahe assets, or sething else?
Yes, a bunch of <Reference>
If you are generating reference items, can you try adding ExternallyResolved=true metadata to them?
Han, maybe exactly what I'm looking for, gonna try that, thanks 👍
Ok, so using ExternallyResolved
is actually helping to reducing the compilation to 3.2 seconds... with 1.2 seconds is spend for ResolveAssemblyReference
, which is around 60ms per csproj. That still quite a lot. I haven't checked, but what is remaining that ResolveAssemblyReference
has to do on these assembly references?
From the profiling, I'm getting this now:
Couldn't the StateFile be cached in memory? It would save maybe half of the time for ResolveAssemblyReference
so in my case, I have around 35 projects * 40ms is saving still almost 1.5 seconds per "nothing-changed" build
The state file for each project is around 600 to 700Kb on the disk, so likely quite laborious to load, as I'm running everything inside the same process, it might not reflect how things are working with node reuse (not sure an inprocess cache would be relevant actually...)
Relevant: #3914, #3868
The StateFile
cache currently uses the .NET BinaryFormatter
for serialization, which is slow and space-inefficient for large data sets. There's a PR in #3868 to change the backing serialization format that should dramatically improve the StateFile
read times. There's a similar perf trace in there which shows it cutting the ResolveAssemblyReference
time for a solution by about half.
@xoofx, curious why you are building everything single proc.
@xoofx, curious why you are building everything single proc.
It's for testing. Running things with a regular msbuild doesn't bring any differences. Projects have many dependencies between them, so lots of them can't run concurrently... likely msbuild with nodereuse is actually fighting to build these projects. Also I measured that launching the msbuild exe from the command line on an empty projects takes a 500-800ms (on a single project), while running things in-proc in a loop goes down to 100-150ms (that's likely mostly AssemblyResolveReferences
)
The StateFile cache currently uses the .NET BinaryFormatter for serialization, which is slow and space-inefficient for large data sets. There's a PR in #3868 to change the backing serialization format that should dramatically improve the StateFile read times. There's a similar perf trace in there which shows it cutting the ResolveAssemblyReference time for a solution by about half.
Great! Though, checking a simple ConsoleApp with new SDK and the cache file is only 80Kb, while AssemblyResolveReferences
has all the SDK ref assemblies to reference... wondering why there is such a difference, maybe my setup is making something different (they are not SDK or regular TargetFrameworkDir files, but only <Reference>
)... 700Kb for a dependency that doesn't bring much value for the assemblies listed is quite annoying (assemblies are not copied, the assembly list is not duplicated, there are no conflicts.... in the end they are just passed to csc as-is...etc.)
Wondering if #3914 could help also to fine grained the granularity of the cache data to share them between projects. In my case, most of these 800KB per project are exactly the same and could be actually shared. Do you think that would be relevant?
ResolveAssemblyReferences
runs unconditionally even on incremental builds (because the user may have installed a targeting pack or, for full-framework scenarios, GACed an assembly). This makes it performance-sensitive.With the advent of .NET Core micro-assemblies, it has become common to pass many, many references to RAR. This exacerbates performance problems.
@rynowak was kind enough to run a profiler to see this: