dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

NuGetSdkResolver adds significant overhead to evaluation and should not be used #4025

Open davkean opened 5 years ago

davkean commented 5 years ago

For evaluation time, The NuGetSdkResolver is extremely slow.

Based on investigation in CoreFX, I found:

I've looked at the original design, and it's built on a incorrect premise; that its performance impact will only affect the loading state of projects that use it. The image it shows where projects are being loaded in the background via Asynchronous Solution Load (ASL) was removed from the product in 15.5.

If a project opts into this resolver, it has the following effects:

When we designed SDK resolvers, it was explicitly called out that due to them being used during evaluation - that they must be extremely fast and must not hit the network. While this only hits the network on unrestored state, it still has a large negative impact on evaluation time when the package is already downloaded.

This is the entire reason that NuGet restore does not run during the build while inside Visual Studio.

Rogue resolvers can cause Visual Studio and other IDEs to be blamed for performance and UI delay issues, please remove or change the design of this resolver to play nicely with Visual Studio.

livarcocc commented 5 years ago

ACK on the negative performance impact. However, taking the feature out at this point is an impossibility.

Could we work together to figure out how to change the current design in a way to meet the perf requirements for VS?

nguerrera commented 5 years ago

@davkean Do you have the repro steps for how this measurement is taken?

It is very bizarre that the nuget sdk resolver would be any worse than the regular resolver in the restored state.

nguerrera commented 5 years ago

Putting this in 16.1 based on discussion just now. The resolver is actually now in the nuget repo, but first we should check that it is not something msbuild is doing in calling the resolver that is taking the time.

nguerrera commented 5 years ago

Loooking at the code, https://github.com/NuGet/NuGet.Client/blob/dev/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs

It really doesn't do much in the restored state.

davkean commented 5 years ago

It was doing work, loading files off disk and parsing them if I remember correctly. I wish I could remember how I verified the overhead, I think I isolated in a single project and measured evaluation time. It was the overwhelming cause of the CoreFX evaluation time.

nguerrera commented 5 years ago

I can repro this:

Evaluation time for dotnet new console: 178 ms Evaluation time for dotnet new console + MSBuild.Sdk.Extras (unrestored): 3.8 seconds Evaluation time for dotnet new console + MSBuild.Sdk.Extras (restored): 381 ms

nguerrera commented 5 years ago

I will profile this simple repro.

nguerrera commented 5 years ago

image

Some quick thoughts:

It is spending the most time parsing global.json, which is sad because my global.json is 74 bytes. We should look at switching to the new corefx json reader. There's a sources version for netfx IIRC. This file is trivial, we don't need a fancy deserializer.

That said, it will avoid doing this more than once per build across projects:

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs#L89-L90

But I suspect VS evaluations aren't retaining this context object between projects. Fixing that if possible will help non-NuGet sdk evaluation considerably too.

Another major contributor is nuget loading its configuration files. I presume it needs to do this to answer the question of where a downloaded package is because values in the config can impact the answer to that. The parsing there can probably be improved as well.

For both of these we can possibly cache things in static state. and check config timestamps. I did some trickery like that in the non-nuget resolver.

davidfowl commented 5 years ago

It is spending the most time parsing global.json, which is sad because my global.json is 74 bytes. We should look at switching to the new corefx json reader. There's a sources version for netfx IIRC. This file is trivial, we don't need a fancy deserializer.

You don't have to switch if that's a big short term cost. Just use the JSONReader directly (since the global.json schema is fixed). The corefx reader should be lower allocation regardless though...

nguerrera commented 5 years ago

So it's mostly Jit time being measured there. 140ms of jitting newtonsoft.json.

davidfowl commented 5 years ago

😮 wow

nguerrera commented 5 years ago

Have to check if build at least reuses that jitting. I think it might if the version of nuget in the sdk matches the version used by nuget msbuild resolver. The location of the dll is different for full msbuild.

davkean commented 5 years ago

This time matches almost identical to what I found in https://github.com/dotnet/sdk/issues/1483.

nguerrera commented 5 years ago

image

davkean commented 5 years ago

Nope, strike that, JSON.NET was way less than above.

davidfowl commented 5 years ago

We could just NGEN the one that comes with VS.

nguerrera commented 5 years ago

@davkean Different API used? This is a huge hammer to read global.json: https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/GlobalJsonReader.cs#L66

davkean commented 5 years ago

Yep, just to grab a couple of strings from that file, hell I betcha regex would beat this hands down.

nguerrera commented 5 years ago

Yeah, there's a ReadAllText + IndexOf before it to avoid the deserialization, and that's not even showing up. 😆

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/GlobalJsonReader.cs#L39-L46

nguerrera commented 5 years ago

I'll study some more tomorrow. This is interesting, but doesn't fully explain seeing 200ms - 400ms per project. I'm seeing 200ms, of which 140ms is Jitting that would not have been happening N times for loading of full solution, right?

Mind you above you said: "I think I isolated in a single project and measured evaluation time". Possible your 200ms - 400ms included jitting?

davkean commented 5 years ago

When I was looking at traces for CoreFx, the resolver showing up as significant, enough to make me isolate this case and measure it by itself. Best way to measure this would be grab a real world use - such as the CoreFx solution I shared.

nguerrera commented 5 years ago

Yep. Will do that.

nguerrera commented 5 years ago

The remaining 60ms is still quite bad. Just wondering if the isolation process bloated it to 200ms. I will dig.

davkean commented 5 years ago

Bear in mind that the cost in that solution load will be ballooned by the number of evaluations (we evaluate every single project and configuration because the solution is incomplete) but it should be accurate to the cost of the total evaluation.

jeffkl commented 5 years ago

The parsed SDK versions from global.json are cached so it should only be read once.

https://github.com/NuGet/NuGet.Client/blob/b6cd1677ae2bd3b07f4cc23c2e8d408f784e8b05/src/NuGet.Core/Microsoft.Build.NuGetSdkResolver/NuGetSdkResolver.cs#L81

MSBuild caches the resolved path for a particular SDK so that should only happen once. But for each unique SDK, the NuGet settings are read. That could be improved.

jeffkl commented 5 years ago

@nguerrera can you double check that if you have two or more projects referencing the same SDK-from-a-NuGet-package that there is no overhead except for one? Also we still need to figure out if VS is using the EvaluationContext that enables caching of SDK resolution results. If that hasn't been implemented yet, it would add a ton of overhead in VS.

nguerrera commented 5 years ago

I don't think VS is using the context, (for non-build evaluation) which is probably the biggest issue here.

nguerrera commented 5 years ago

Adding you to a thread Dave started on that.

nguerrera commented 5 years ago

Confirming that nuget setting reading is showing up. I mentioned that too with the trace.

Even for happening once or once per sdk and even subtracting jit, it shouldn't need to take so long to read a few tiny files. Probably not what needs to be fixed for the large sln case, though, which I wil be studying next. Above was just thoughts as they came to me looking at a trace for the first time.

nguerrera commented 5 years ago

The question for evaluation retaining context.State is what's the correct lifetime? It's not as simple as per build is, right? We have to react to global.json / project files changing at some point. For that reason I think we should attack both the cost on cache miss and hit. And separately ensure a good yet correct hit rate.

Pilchie commented 5 years ago

@cdmihai and @lifengl have done some thinking about evaluation contexts in VS. We'd like to hook them up for the IO caching stuff too.

rainersigwald commented 5 years ago

There might be an easy win at "wrap a cache lifetime around solution load" where a ton of evaluations happen in a batch.

cdmihai commented 5 years ago

There might be an easy win at "wrap a cache lifetime around solution load" where a ton of evaluations happen in a batch.

Whenever VS needs to evaluate a batch of projects would be good places to wrap in evaluation contexts, because if the disk changes then the result is undefined either way. Solution load is the most natural, but all APIs that receive a collection of projects to be evaluated / re-evaluated should do this.

There's also additional improvements that can be made to enhance the evaluation context:

davkean commented 5 years ago

@rainersigwald Yeah I was thinking the same thing; how do I use said cache?

cdmihai commented 5 years ago

You use one of these new Project factory methods: https://github.com/Microsoft/msbuild/blob/master/src/Build/Definition/Project.cs#L526-L570 And pass the context in via ProjectOptions: https://github.com/Microsoft/msbuild/blob/master/src/Build/Definition/ProjectOptions.cs#L40

var context = EvaluationContext.Create(SharingPolicy.Shared);
var evaluatedProjects = projectsPaths.Select(p => Project.FromFile(p, new ProjectOptions{EvaluationContext = context}));
jeromelaban commented 5 years ago

On the same note, the SdkResolverManifest resolver:

https://github.com/Microsoft/msbuild/blob/b4f5792a8fadab87e5199ec54d355c3ba1397a6a/src/Build/BackEnd/Components/SdkResolution/SdkResolverManifest.cs#L27

is taking 236ms to initialize on my machine, which is more than the 153ms that the NuGetSdkResolver initialization is taking, mainly spent in the processing of the DataContractSerializer.

nguerrera commented 5 years ago

That can easily be converted to XmlReader, but does it happen more than once?

nguerrera commented 5 years ago

We should still change it, just curious if one time or many.

jeromelaban commented 5 years ago

The result seems cached, as it goes through CachingSdkResolverService, but it adds up to the rest. For instance, on the same profiling session, a single new ProjectCollection() takes 500ms (most of the time is spent searching for toolsets).

davkean commented 5 years ago

I was walking through some code today, and figured out that we do currently do use EvaluationContext during solution load if enabled via remote setting (let's us do an experiment). Figuring out if we actually turned this experiment on.

davkean commented 5 years ago

To update the thread, the team originally found no benefit to using evaluation context during load when we tested this - hence why it wasn't turned on (it could only break things). I suspect these resolvers might change this so we'll probably want to re-experiment on this, which I've filed https://github.com/dotnet/project-system/issues/4649 to investigate.

davkean commented 5 years ago

Here's a customer raising similar concerns on the symptoms I raised above: https://github.com/NuGet/Home/issues/8157.

jeffkl commented 5 years ago

Please review https://github.com/NuGet/NuGet.Client/pull/2879 to help me make this faster when reading global.json

davkean commented 5 years ago

Another internal partner reached out to me where they are experencing hangs and delays when reloading projects or adding a new file, and its caused by the same underlying resolver.

ViktorHofer commented 4 years ago

Wanted to bump this. Just had a conversation with @davkean and he pointed to this as being one of the reasons why loading an sln with 500+ projects is terribly slow. (dotnet/runtime - libraries part)

@rainersigwald I think the assignee needs to be updated?

CyrusNajmabadi commented 4 years ago

This appears to be an issue causing hangs while VS is loading. Can we get this off the UI codepath while loading? Thanks! :)

cdmihai commented 4 years ago

Can we get this off the UI codepath while loading?

True, there's two/three potential lines of work:

CyrusNajmabadi commented 4 years ago

@cdmihai can this be prioritized? This can literally block the UI thread for minutes while VS is loading. Please see the linked dump. This breaks VS entirely for the user. It also violates a ton of best practices around using the UI thread.

Can you explain what the nuget resolver does? if it's anything related to IO, it sounds like the only option here is to get this off the UI thread.

davkean commented 4 years ago

@cdmihai This is not about doing builds on the UI thread, this is evaluation. We need to block on UI thread on evaluation because the user experience is really odd when you don't (you just added a file and it doesn't appear in the tree), or you just loaded a project and your files don't appear.

This isn't about making this feature faster, we were promised this extension point would never be used to hit the network which you can't avoid. We need to come up with an new extension point that is designed with IDEs in mind. Given I've seen no progress on this, and these bugs are constantly pushed our way, I'm going to introduce new analysis that prompts a yellow bar when this feature causes delays.

jeffkl commented 4 years ago

I'm trying to get some time to fix the perf issues in the case when the packages are already on disk:

The goal would be to make resolving an SDK from a NuGet package on par with the .NET Core SDK resolver. Does that seem like the right direction?

To address the performance of when the resolver needs to pull a package, I want to add to the NuGet API so we can pull a single package faster. Most SDK packages are very small so it should be quick. https://github.com/NuGet/Home/issues/5919

Alternatively, I could add a new API to MSBuild that the project system could call to pre-resolve SDKs so they don't happen during evaluation. That would be a bit more work but is entirely possible.