Closed marcin-krystianc closed 1 year ago
Team Triage: Thanks for filing a great bug/repro! Hopefully we'll be able to dedicate some time to analyze this in the coming weeks!
Could you share the binlog?
Could you share the binlog?
Sure, it is 40MB though, so it is split into two parts:
msbuild.zip.001
)msbuild.zip.002
)Is there a reason why maxcpucount is /m:1? Could you explain the /target list? Can you reduce it to a single target? What is the actual target you want to run? If this not something you control, then checkout my recent PR dotnet/msbuild#7512 to address parallelism with Targets and Sln.
There is a strange 3s gap at the 63s mark.
Btw, what are your computer specs? On my 2990 threadripper, it took 25s for /m:1 and 15s for /m.
Could Windows Defender or other AntiMalware be interfering?
Is there a reason why maxcpucount is /m:1?
Adding more MSBuild processes doesn't improve performance that much (maybe by 50% at max) and it definitely makes the analysis much harder.
Could you explain the /target list?
This list is what Rider uses for design-time builds (it is possible to collect binary logs with Rider by checking Help -> Diagnostic Tools -> Enable Design Time Logging). VisualStudio uses a different list of targets (see docs on how to get logs from VS), but to my understanding, these are targets defined internally in VS so it is not as easy to call them from the CLI.
Can you reduce it to a single target?
The point is to mimic the design-time build behavior as close as possible. We could focus on a single target for now but the goal is to make the entire design-time build performant.
If this not something you control, then checkout my recent PR https://github.com/dotnet/msbuild/pull/7512 to address parallelism with Targets and Sln.
Actually, IDEs don't run the design time build for the entire sln at once - they make a new request for each individual project, so your PR will not help in this case.
Btw, what are your computer specs?
I'm using a mobile i7-8550u, but I've just tested it on the much faster AMD EPYC 7571 and still, it took ~2 min. Did you build the solution, before running the design-time build? When it is not built it is much faster, but that is not the use case we care about.
Did you build the solution, before running the design-time build? When it is not built it is much faster, but that is not the use case we care about.
What matters is running dotnet restore
as that populates all the required dependencies.
Build is irrelevant, I think.
That's what makes things like Ionide.ProjInfo able to populate compiler options for F# code analysis using FSharp.Compiler.Service. Without a restore lots of things fail early and are thus much quicker.
I ran a slightly modified command - mainly different logging.
Results:
Restore | /m | Time |
---|---|---|
Before restore | /m:1 | 18.36s |
Before restore | /m | 11.43s |
After restore | /m:1 | 54.92s |
After restore | /m | 24.22s |
CPU: Ryzen 5700g (8-core / 16-thread)
Btw, what are your computer specs? On my 2990 threadripper, it took 25s for /m:1 and 15s for /m.
Worth noting that it is well known (lots of experiments done) at our company that /m
is rarely optimal for shorter builds like design-time build due to, I presume, increased IPC and lack of inter-process project cache.
That's why when using Rider we suggest to users to set it to the equivalent of /m:8
for solution loading, even though the CPUs we use are 16 core/32 thread Xeons.
This obviously depends on the size of the solution and project structure.
The below results demonstrate that: | Restore | /m | Time |
---|---|---|---|
After restore | /m:1 | 54.92s | |
After restore | /m:5 | 26.05s | |
After restore | /m:7 | 24.39s | |
After restore | /m:10 | 23.07s | |
After restore | /m:12 | 23.98s | |
After restore | /m (/m:16) | 24.22s |
If this not something you control, then checkout my recent PR https://github.com/dotnet/msbuild/pull/7512 to address parallelism with Targets and Sln.
Actually, IDEs don't run the design time build for the entire sln at once - they make a new request for each individual project, so your PR will not help in this case.
To clarify: IDEs (at least Rider) already parallelise this completely by sending multiple requests to MSBuild in parallel, one per project.
Will describe this Rider example just as another confirmation that this is a widespread issue:
Rider has a semi-recent setting that allows one to increase the number of MSBuild processes for solution loading. As seen below, it defaults to 1. That is because the Rider team observed that higher numbers don't necessarily yield better results, so they kept it at a conservative number that at least doesn't waste machine resources.
You can also see that the full build does automatically use lots of processes (14/16 in this case). While full build can also suffer from the same IPC/cache issues, the impact is much smaller relative to the amount of actual work required by the build target.
I did some profiling and realized that GC plays quite a role in slowing the main MSBuild process down.
I then ran the following test with and without server GC:
Function Go{ %LocalAppData%\JetBrains\Toolbox\apps\Rider\ch-0\222.3345.88\bin\rider64.exe D:\projekty\TestSolutions\LargeAppWithPrivatePackagesCentralised\solution\LargeAppWithPrivatePackagesCentralised.sln }
# Test 1
$env:DOTNET_gcServer=1;$env:COMPlus_gcServer=1;Go
# Test 2
$env:DOTNET_gcServer=0;$env:COMPlus_gcServer=0;Go
I used 7 MSBuild parallel processes as show in the above comment.
This yielded the following results/logs in Rider's backend (logs that Rider outputs about MSBuild design-time build requests upon solution load):
gcServer=0 (using workstation GC):
00:43:50.463 |I| ProjectModel | RequestBuilder thread:9 | Reloading has finished, statistic:
Total: 46.84 sec
Binding: 21.49 sec
Evaluation: 20.00 sec
00:43:50.466 |I| Build | .NET ThreadPool Worker:52 | Build succeeded.
gcServer=1 (using server GC):
00:44:45.377 |I| ProjectModel | RequestBuilder thread:9 | Reloading has finished, statistic:
Total: 31.81 sec
Binding: 9.50 sec
Evaluation: 11.53 sec
00:44:45.380 |I| Build | .NET ThreadPool Worker:58 | Build succeeded.
So forcing server GC for MSBuild (all) node processes yielded 32% reduction in design-time build 🎉
Below I'm attaching screenshots from DotTrace profiling the main MSBuild node. This is from a different test than the above ones, and I didn't perfectly align the start/stop, but the GC impact can be clearly observed (43% GC time -> 12% GC time).
For reference, here is a ticket I raised with JetBrains to consider enabling Server GC for design-time MSBuild: https://youtrack.jetbrains.com/issue/RIDER-80141/Consider-forcing-server-GC-for-MSBuild-processes-or-at-least-provide-a-button-to-opt-in
Do we know if Visual Studio already has that option enabled?
I'm aware that if too many processes are spawned, all using Server GC, they might start throttling each other too much and causing too much context switching. However the tests I've done show a huge improvement on very big solutions, so clearly there is lots of headroom.
I tried various /m and between /m:4 to /m:8 yielded the best results. 25s to 11s FYI. I added /nr:false to your command line to close finished nodes afterwards to free up the resources. With the flag, it should yield a more consistent result for testing purposes.
I presume, increased IPC and lack of inter-process project cache.
I have noticed this too. dotnet/msbuild#7625
I haven't noticed an improvement with gcServer. Perhaps there are other factors involved or it is machine dependent.
I took perfview traces and binlogs of this repro (sln open in VS). By looking at the traces and binary logs I have not identified any MSBuild core functionality inefficiency.
But when looking at project system what strikes me was ResolvePackageDependencies
task cumulative duration. When looking deeper at it, I noticed that almost 2/3 of of its time is taken by loading and parsing nuget lock files project.assets.json
:
This sln has project.assets.json
size of almost 1MB. Although it is cached, we still have to, during DTB, parse it at least once per each project ie ~270 times.
My knowledge of SDK targets, nuget lock files and DTB is very limited, at best, so I can't tell if this is expected, inefficient or bug. To me it is just a little bit suspicious. @dsplaisted what do you reckon?
I thought that ResolvePackageDependencies
was deprecated and shouldn't be called by default anymore. I think the project system in Visual Studio was updated so that it parses the assets file directly.
@tmeschter @drewnoakes Can you clarify whether the ResolvePackageDependencies
task should still be called in a VS design -time build?
The change @dsplaisted is referring to was to stop returning the full list of all transitively-reachable package references and their contents to the project system, in order to display them in the dependencies tree. After the change, only the top level dependencies were returned, and transitive dependencies were populated lazily as the user expands the tree (or during search in Solution Explorer).
I'm not intimate with the details, but I would expect ResolvePackageDependencies to still be called as part of reference resolution. The DTB needs to resolve referenced assemblies (including those coming from packages) in order to pass that information to Roslyn's language service.
Skimming through a DTB binlog, it seems that the output of ResolvePackageDependencies ends up feeding through ResolveAssemblyReferences.
Looking at this more, it looks like ResolvePackageDependencies
does normally run, but it does less work unless EmitLegacyAssetsFileItems
is true.
However, it still has to read the assets file. It uses an in-memory cache for this so it won't read the same assets file multiple times in the same build, but it looks like this may still be a perf hit in this scenario with a lot of projects in the solution.
Most of the logic for reading the assets file is now in ResolvePackageAssets
, which maintains its own optimized cache of the results and won't read the assets file at all unless the cache is out of date. So we may want to consider reading the PackageDefinitions
and PackageDependencies
items in ResolvePackageAssets
, so that ResolvePackageDependencies
doesn't need to be called at all unless EmitLegacyAssetsFileItems
is true. That should improve incremental build perf and might also significantly improve the design time build perf for this large solution scenario.
@dotnet/msbuild @marcpopMSFT
@rokonec did you want to look into making the SDK side change to have ResolvePackageAssets output the values that RPD currently gets or did you want the SDK to take a look?
The pr added two new properties to the NuGet assets cache file, so it will invalidate the old one.
Please clean the solution and run NuGet restore.
This should be fixed in 7.0.200 via #28405.
Thanks a bunch @marcin-krystianc for the detailed information which enabled us to investigate and address this!
Issue Description
Hi,
productivity in our company is heavily hindered by the slowness of design-time builds which translates into the slowness of solution load (and reload) times in the IDE (Visual Studio and Rider).
For example, we are working with solution containing about 270 C# and F# projects and use Central Package Management to manage version of our NuGet dependencies. The load time of such solution is about 2-3 minutes which is far too long to stay focused.
The fact that we use a mix of C# and F# projects is probably irrelevant because according to my internal testing the solution load time when all projects are C# projects is the same. Also, the usage of central package version management seems to be not really affecting how long it takes to run an individual design-time build. But the concerning consequence of using central package management is that any slightest change in the central file (
Directory.Packages.Props
) triggers a reload of all projects in the solution so the slowness of design-time builds is exacerbated by central package management.We've prepared a skeleton version of our solution https://github.com/marcin-krystianc/TestSolutions/tree/master/LargeAppWithPrivatePackagesCentralisedNGBVRemoved which can be used for testing.
I don't think that the problem is unique to our solution though, my suspicion (confirmed with some experiments) is that any solution of similar size has similar performance characteristics. So, given that the problem is relevant to all large solutions, we are wondering whether there are any plans to improve the performance of design-time builds for large solutions? Maybe a static graph feature can help? It turned out to be a huge win for NuGet restores (https://devblogs.microsoft.com/visualstudio/performance-improvements-in-nuget/#msbuild-static-graph-evaluation) so maybe it is possible to apply it for design-time builds as well? Are you able to provide any hints on where to look for any improvements? Looking at text logs or binary logs (
/bl
) didn't give any obvious answers so far.Steps to Reproduce
LargeAppWithPrivatePackagesCentralisedNGBVRemoved\solution
directorydotnet build
Data
Sample logs from the command above. Please note, that this is not exactly the same as loading the solution in IDE, because IDE runs desgn-time build for each project individually wehereas we run it for entire solution all at once:
Analysis
/m
) doesn't really make it much faster as it seems that msbuild utilised more machine resources, but at the same time it also does much more work as caching inside msbuild is less effective.Versions & Configurations
Regression?
I don't think so.