dotnet / aspnetcore

ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.
https://asp.net
MIT License
35.38k stars 10k forks source link

Investigate PR build failure #25293

Closed ajaybhargavb closed 4 years ago

ajaybhargavb commented 4 years ago

This PR had failures due to build ordering issues. Need to investigate.

More context, https://github.com/dotnet/aspnetcore/pull/25283#issuecomment-681132854

cc @dotnet/aspnet-build

JunTaoLuo commented 4 years ago

Another example https://dev.azure.com/dnceng/public/_build/results?buildId=792496&view=results

dougbu commented 4 years ago

Is the problem usually or always showing up in Microsoft.AspNetCore.App.Runtime❔

sebastienros commented 4 years ago

Update:

From the binlog viewer, this node

Project Name=Project "Microsoft.AspNetCore.App.Runtime.csproj" (GetNativeManifest target(s)): 

Has a property _BuildOutputInPackage that has the pdbs duplicated:

F:\workspace\_work\1\s\artifacts\bin\Microsoft.AspNetCore.Antiforgery\Release\net5.0\Microsoft.AspNetCore.Antiforgery.pdb
F:\workspace\_work\1\s\artifacts\bin\Microsoft.AspNetCore.App.Runtime\Release\net5.0\win-x64\Microsoft.AspNetCore.Antiforgery.pdb

@dougbu says:

_GetBuildOutputFilesWithTfm is a target in the Runtime project. Sounds like it's either (a) wasting time adding duplicates to an iterm group or (b) not removing duplicates when it should. Either way, the root cause is likely a change somewhere in the .NET SDK that rejiggered the inputs to that target.

They might be copied to the same directory. The bin/Microsoft.AspNetCore.App.Runtime folder may also be intended as a staging area for the real layout. You'll need to look more at the project and perhaps a dotnet msbuild /pp:pp.out file for the project to figure out the project's expectations and whether the SDK is doing something different now. I often find myself fliipping back 'n forth betwen the log viewer and VS Code to try and figure these problems out.

dougbu commented 4 years ago

The PDB already in a package problem is impacting a lot of builds:

> runfo search-buildlog --definition=aspnet --pr --value="pdb' is not added because the package already contains file" --count 50
https://dev.azure.com/dnceng/public/_build/results?buildId=799262
https://dev.azure.com/dnceng/public/_build/results?buildId=799186
https://dev.azure.com/dnceng/public/_build/results?buildId=799129
https://dev.azure.com/dnceng/public/_build/results?buildId=799129
https://dev.azure.com/dnceng/public/_build/results?buildId=799079
https://dev.azure.com/dnceng/public/_build/results?buildId=797240
https://dev.azure.com/dnceng/public/_build/results?buildId=798807
https://dev.azure.com/dnceng/public/_build/results?buildId=798686
https://dev.azure.com/dnceng/public/_build/results?buildId=797948
https://dev.azure.com/dnceng/public/_build/results?buildId=797948
https://dev.azure.com/dnceng/public/_build/results?buildId=798502
https://dev.azure.com/dnceng/public/_build/results?buildId=798307
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=797993
https://dev.azure.com/dnceng/public/_build/results?buildId=797993
https://dev.azure.com/dnceng/public/_build/results?buildId=798020
https://dev.azure.com/dnceng/public/_build/results?buildId=798020
https://dev.azure.com/dnceng/public/_build/results?buildId=798020
https://dev.azure.com/dnceng/public/_build/results?buildId=798018
https://dev.azure.com/dnceng/public/_build/results?buildId=798018
https://dev.azure.com/dnceng/public/_build/results?buildId=798018
https://dev.azure.com/dnceng/public/_build/results?buildId=797973
https://dev.azure.com/dnceng/public/_build/results?buildId=797928
https://dev.azure.com/dnceng/public/_build/results?buildId=797928
https://dev.azure.com/dnceng/public/_build/results?buildId=797896
https://dev.azure.com/dnceng/public/_build/results?buildId=797896
https://dev.azure.com/dnceng/public/_build/results?buildId=797616
https://dev.azure.com/dnceng/public/_build/results?buildId=797887
https://dev.azure.com/dnceng/public/_build/results?buildId=797887
https://dev.azure.com/dnceng/public/_build/results?buildId=797772
https://dev.azure.com/dnceng/public/_build/results?buildId=797772
https://dev.azure.com/dnceng/public/_build/results?buildId=797813
https://dev.azure.com/dnceng/public/_build/results?buildId=797644
https://dev.azure.com/dnceng/public/_build/results?buildId=797724
https://dev.azure.com/dnceng/public/_build/results?buildId=797724

Evaluated 50 builds
Impacted 23 builds
Impacted 37 jobs
Searched 14002 records

Strangely, though we have warn-as-error enabled, this isn't failing our builds or at least not all of our builds. I picked https://dev.azure.com/dnceng/public/_build/results?buildId=797973&view=logs&j=1b89928a-2219-5ef9-602f-f95beb3da4dc&t=bcce746a-741e-59b7-adec-a4a82d8c5e52 somewhat randomly and see lots of error NU5118 but also Build succeeded.. @rainersigwald any ideas why dotnet msbuild is fine w/ these particular errors❔ We aren't suppressing them in the repo.

TanayParikh commented 4 years ago
/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible. [/datadisks/disk1/workspace/_work/1/s/src/Components/benchmarkapps/Wasm.Performance/TestApp/Wasm.Performance.TestApp.csproj]
##[error]src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: (NETCORE_ENGINEERING_TELEMETRY=Build) Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible.
/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible. [/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/testassets/HostedInAspNet.Client/HostedInAspNet.Client.csproj]
##[error]src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: (NETCORE_ENGINEERING_TELEMETRY=Build) Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible.
/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible. [/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/testassets/StandaloneApp/StandaloneApp.csproj]
##[error]src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: (NETCORE_ENGINEERING_TELEMETRY=Build) Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible.

https://github.com/dotnet/aspnetcore/pull/25577#issuecomment-686726630

dougbu commented 4 years ago

More on the PDB duplicates: May be able to tell from the following (which ignores PRs) when the problem started…

> runfo search-buildlog --definition=aspnet --value="pdb' is not added because the package already contains file" --count 50
https://dev.azure.com/dnceng/public/_build/results?buildId=799262
https://dev.azure.com/dnceng/public/_build/results?buildId=799262
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=798151
https://dev.azure.com/dnceng/public/_build/results?buildId=798020
https://dev.azure.com/dnceng/public/_build/results?buildId=798020
https://dev.azure.com/dnceng/public/_build/results?buildId=797887
https://dev.azure.com/dnceng/public/_build/results?buildId=797644
https://dev.azure.com/dnceng/public/_build/results?buildId=796964
https://dev.azure.com/dnceng/public/_build/results?buildId=796399
https://dev.azure.com/dnceng/public/_build/results?buildId=795369
https://dev.azure.com/dnceng/public/_build/results?buildId=795151
https://dev.azure.com/dnceng/public/_build/results?buildId=795131
https://dev.azure.com/dnceng/public/_build/results?buildId=794466
https://dev.azure.com/dnceng/public/_build/results?buildId=794020
https://dev.azure.com/dnceng/public/_build/results?buildId=794020
https://dev.azure.com/dnceng/public/_build/results?buildId=793901

Evaluated 50 builds
Impacted 13 builds
Impacted 18 jobs
Searched 14426 records

No, I haven't checked whether that's every build since some date.

dougbu commented 4 years ago

The missing JavaScript file problem seems to have affected just one non-PR build:

> runfo search-buildlog --definition=aspnet --value="blazor.webassembly.js' because it does not exist or is inaccessible" --count 50
https://dev.azure.com/dnceng/public/_build/results?buildId=799302
https://dev.azure.com/dnceng/public/_build/results?buildId=799302

Evaluated 50 builds
Impacted 1 builds
Impacted 2 jobs
Searched 14426 records
dougbu commented 4 years ago

that issue also affected one PR:

> runfo search-buildlog --definition=aspnet --value="because it does not exist or is inaccessible" --count 50 --pr
https://dev.azure.com/dnceng/public/_build/results?buildId=802001
https://dev.azure.com/dnceng/public/_build/results?buildId=802001

Evaluated 50 builds
Impacted 1 builds
Impacted 2 jobs
Searched 13940 records
dougbu commented 4 years ago

@Pilchie just got really unlucky w/ four PRs opened at about the same time:

dougbu commented 4 years ago

I'm able to reproduce the PDB duplication problem locally and seeing the build fail when it occurs. Still can't see why we sometimes hit those NU5118 errors without failing the build. Any ideas on that strange part of this @rainersigwald❔

dougbu commented 4 years ago

Assigned @BrennanConroy because he's got #25707 out and @wtgodbe because he's delving into the duplicate PDB problem

dougbu commented 4 years ago

@SteveSandersonMS is anyone looking at the Components ordering issue that leads to occasional errors like the following❔ This isn't as urgent as the other problems but we should try to get to the bottom of it.

@halter73 if nobody gets to this part of the issue today, please include it in your hand-off notes for the next person in the rotation.

/datadisks/disk1/workspace/_work/1/s/src/Components/WebAssembly/Sdk/src/targets/Microsoft.NET.Sdk.BlazorWebAssembly.Current.targets(256,5): error MSB3954: Failed to compute hash for file '/datadisks/disk1/workspace/_work/1/s/src/Components/Web.JS/dist/Release/blazor.webassembly.js' because it does not exist or is inaccessible. [/datadisks/disk1/workspace/_work/1/s/src/Components/benchmarkapps/Wasm.Performance/TestApp/Wasm.Performance.TestApp.csproj]
SteveSandersonMS commented 4 years ago

It's the first I've heard of it. I don't recall seeing this happen on local builds. Is there any known context, such as when it started happening? Might it be a side-effect of whatever ordering issue causes the duplicate PDB problem?

dougbu commented 4 years ago

Is there any known context, such as when it started happening?

The exact timing isn't clear because the problem happens sporadically. I reported a couple of runfo results for builds failing that way in https://github.com/dotnet/aspnetcore/issues/25293#issuecomment-687357201 and https://github.com/dotnet/aspnetcore/issues/25293#issuecomment-687433111


Might it be a side-effect of whatever ordering issue causes the duplicate PDB problem?

No, that only occurs when building the Runtime project.

dougbu commented 4 years ago

And,

It's the first I've heard of it.

Ooopsie

wtgodbe commented 4 years ago

Where I'm at so far -

https://github.com/NuGet/NuGet.Client/blob/3f573425ac74d90b96a61e5e85ed39a39aad8a7f/src/NuGet.Core/NuGet.Build.Tasks.Pack/NuGet.Build.Tasks.Pack.targets#L207-L223

It looks like BuildOutputInPackage is getting duplicate .pdb entries - one from the output folder of Microsoft.AspNetCore.App.Runtime, and one from the output of the project that produced the .pdb. That item group gets populated here:

https://github.com/NuGet/NuGet.Client/blob/3f573425ac74d90b96a61e5e85ed39a39aad8a7f/src/NuGet.Core/NuGet.Build.Tasks.Pack/NuGet.Build.Tasks.Pack.targets#L420-L429

The size of the binlog has made this hard to trace back locally, and building locally hasn't repro'd it yet for me. But, I did notice that the duplicates appear to be present even in green CI builds. Still poking around.

wtgodbe commented 4 years ago

Looks like in 3.1, _GetBuildOutputFilesWithTfm created output that was populated with just .dll's and .pdb's from the Microsoft.AspNetCore.App.Runtime output folder, whereas today it also has .pdb's from the output folders of the individual projects

wtgodbe commented 4 years ago

Not sure when this started, but what's happening:

https://github.com/dotnet/aspnetcore/blob/3d042ac9aaf6df4cfd1d2bd4a984a86a45629285/src/Framework/App.Runtime/src/Microsoft.AspNetCore.App.Runtime.csproj#L282-L293

ResolveReferences adds the .pdb's from the individual project output dirs to ReferenceCopyLocalPaths, then it appears the condition on https://github.com/dotnet/aspnetcore/blob/3d042ac9aaf6df4cfd1d2bd4a984a86a45629285/src/Framework/App.Runtime/src/Microsoft.AspNetCore.App.Runtime.csproj#L287 isn't working, since the .pdb's in the output directory of the SharedFx project don't have the same %(Identity) as the ones in the in the output dirs of the individual projects. Tweaking that condition should fix the issue

wtgodbe commented 4 years ago

Resolved by https://github.com/dotnet/aspnetcore/pull/25733

wtgodbe commented 4 years ago

Looks like this is still recurring. https://github.com/dotnet/aspnetcore/pull/25733 did make NuGetPackInput free of duplicates, but we're still seeing this failure, e.g. https://dev.azure.com/dnceng/public/_build/results?buildId=809582&view=results. Looking into it.

wtgodbe commented 4 years ago

Fixed by https://github.com/dotnet/aspnetcore/pull/25785