NuGet / Home

Repo for NuGet Client issues
Other
1.49k stars 250 forks source link

Race condition in nuget msbuild sdk resolver can cause intermittent failures #8652

Closed nguerrera closed 1 year ago

nguerrera commented 4 years ago

See repro steps in https://github.com/dotnet/arcade/issues/2377

With a clear global packages folder + pristine working directory + a large solution importing msbuild sdks via nuget, there can be failures to resolve the NuGEt msbuild SDKs when opening the solution. Attaching a debugger shows this exception as a possible root cause:

System.IO.IOException
  HResult=0x80070020
  Message=The process cannot access the file 'C:\Users\nicholg\AppData\Local\Temp\NuGetScratch\lock\19d6b8e17ce108081ea2107ff2d8329b6568f80d' because it is being used by another process
rrelyea commented 4 years ago

@jeffkl - can you please direct as appropriate.

jeffkl commented 4 years ago

It looks like the API is attempting to use the locking mechanism which is failing. I wouldn't say that's specific to the SDK resolver but more of a problem with the locking mechanism right?

nguerrera commented 4 years ago

Maybe that is expected? Is it relying on "file in use" errors as a locking mechanism? Is that first chance exception a red herring?

The fact remains that the resolver randomly fails under the conditions described.

jeffkl commented 4 years ago

Yes actually I think that's the mechanism, if the file's in use its like a system-wide mutex. Are there other exceptions? I've seen where the resolver bubbles up an AggregateException but have never gotten my hands on a DMP to tell me what the inner exception is.

rrelyea commented 4 years ago

@zivkan is examining potential fixes to remove exceptions from that locking scenario. Testing and debating fix.

nguerrera commented 4 years ago

I can try to produce a .dmp when things settle down on 3.1 preview 1 and 3.0.1 servicing. The repro steps in https://github.com/dotnet/arcade/issues/2377 combined with dotnet nuget locals all --clear and git clean made the repro relatively reliable for me. I should have saved a dmp then...

zivkan commented 4 years ago

It does look like how NuGet does system wide mutex, as @jeffkl said. And yes, it's a first chance exception, caught in a loop.

I have a PR out NuGet/NuGet.Client#3076 which will stop using file-locking for tasks/threads within the same process, but it will still use file locking in case there are multiple processes doing restores at the same time.

jeffkl commented 4 years ago

@nguerrera just to be sure we're talking about the same thing:

I have hit an issue in Azure DevOps hosted builds where the very first restore/build fails because the NuGetSdkResolver throws an AggregateException so a project SDK like Microsoft.Build.CentralPackageVersions can't be found. If I run two builds on the same machine, the second one would succeed, so it looks like something to do with having a fresh machine.

Is that the same thing you're seeing? Or am I thinking of a different issue?

nguerrera commented 4 years ago

Sounds similar, don’t recall AggregateException but this was in IDE which may have buried it. I can repro “clean machine” by cleaning git repo and global packages folder. This came up a long time ago and only got around to filing now based on discussion on linked arcade issue

jeffkl commented 4 years ago

Here's an example of what I keep seeing:

https://jeffkl.visualstudio.com/Public/_build/results?buildId=477

I ended up adding a second build to my pipeline as a retry and the second build always succeeds. But the first one fails because the NuGetSDKResolver hits an exception and doesn't resolve the MSBuild project SDK:

##[command]"D:\a\_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.151.2\ps_modules\MSBuildHelpers\vswhere.exe" -version [16.0,17.0) -latest -format json
##[command]"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\MSBuild\Current\Bin\msbuild.exe" "d:\a\1\s\MSBuildProjectCreator.sln" /nologo /nr:false /dl:CentralLogger,"D:\a\_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.151.2\ps_modules\MSBuildHelpers\Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll";"RootDetailId=186f5898-3707-42a9-89d8-8143fb5140fc|SolutionDir=d:\a\1\s"*ForwardingLogger,"D:\a\_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.151.2\ps_modules\MSBuildHelpers\Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll" /clp:v=n /p:platform="Any CPU" /p:configuration="Debug" /p:VisualStudioVersion="16.0" /p:_MSDeployUserAgent="VSTS_91ec5507-2284-4d39-828e-316a8a1f1c0a_build_18_0"
Build started 10/3/2019 2:06:16 PM.
     1>Project "d:\a\1\s\MSBuildProjectCreator.sln" on node 1 (Restore target(s)).
     1>ValidateSolutionConfiguration:
         Building solution configuration "Debug|Any CPU".
       Restoring packages for C:\Users\VssAdministrator\AppData\Local\Temp\aacf50f0-bdd3-4301-9a45-82e8de8b2e01...
##[error]Directory.Build.targets(0,0): Error : One or more errors occurred.
     0>d:\a\1\s\Directory.Build.targets : error : One or more errors occurred.
##[error]Directory.Build.targets(0,0): Error : C:\Program Files\dotnet\sdk\3.0.100\Sdks\Microsoft.Build.CentralPackageVersions\Sdk not found. Check that a recent enough .NET Core SDK is installed and/or increase the version specified in global.json.
     0>d:\a\1\s\Directory.Build.targets : error : C:\Program Files\dotnet\sdk\3.0.100\Sdks\Microsoft.Build.CentralPackageVersions\Sdk not found. Check that a recent enough .NET Core SDK is installed and/or increase the version specified in global.json.
         GET https://api.nuget.org/v3-flatcontainer/microsoft.build.centralpackageversions/index.json
         OK https://api.nuget.org/v3-flatcontainer/microsoft.build.centralpackageversions/index.json 90ms
         GET https://api.nuget.org/v3-flatcontainer/microsoft.build.centralpackageversions/2.0.36/microsoft.build.centralpackageversions.2.0.36.nupkg
         OK https://api.nuget.org/v3-flatcontainer/microsoft.build.centralpackageversions/2.0.36/microsoft.build.centralpackageversions.2.0.36.nupkg 30ms
       Installing Microsoft.Build.CentralPackageVersions 2.0.36.
     1>Done Building Project "d:\a\1\s\MSBuildProjectCreator.sln" (Restore target(s)) -- FAILED.

Build FAILED.

         d:\a\1\s\Directory.Build.targets : error : One or more errors occurred.
         d:\a\1\s\Directory.Build.targets : error : C:\Program Files\dotnet\sdk\3.0.100\Sdks\Microsoft.Build.CentralPackageVersions\Sdk not found. Check that a recent enough .NET Core SDK is installed and/or increase the version specified in global.json.

    0 Warning(s)
    2 Error(s)

Time Elapsed 00:00:09.75
##[error]Process 'msbuild.exe' exited with code '1'.

If I use dotnet build on the same repo, it works just fine. I haven't been able to repro locally so I don't haven't captured a DMP. Is there a way to get a stack from AzDevOps?

I'm not 100% sure the issue I see is the same as you're seeing but it seems like it could be.

nguerrera commented 4 years ago

Friends don't let friends catch (Exception ex) { Log(ex.Message); }

jeffkl commented 1 year ago

Closing this issue as stale, let us know if anyone is still encountering the problem and we'll re-open.