dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.55k stars 4.54k forks source link

Dotnet build hangs while building libs.tests natively using cross built dotnet sdk #101121

Open alhad-deshpande opened 2 months ago

alhad-deshpande commented 2 months ago

Description

Dotnet build hangs while building libs.tests natively using cross built dotnet sdk.

Brief Analysis and probable root cause:

  1. The hang occurs while building project using msbuild runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj
  2. Mono trace log for "Microsoft.DotNet.HotReload.Utils.Generator" shows there is FileNotFoundException.
  3. Traced the filename using strace command and found that the missing file is "/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.dll".
  4. If we check the directory contents for /root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/ then there is a file with name "Microsoft.Build.Locator.dll" but not "Microsoft.Build.dll".

Complete mono trace log for Microsoft.DotNet.HotReload.Utils.Generator namespace: [root@devfirst-dotnet-rhel8-8-1 logs]# export MONO_ENV_OPTIONS="--trace=N:Microsoft.DotNet.HotReload.Utils.Generator" [root@devfirst-dotnet-rhel8-8-1 logs]# /root/alhad/runtime/dotnet-sdk-ppc64le/.dotnet/dotnet /root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/build/../tools/net9.0/Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll -msbuild:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj -script:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/deltascript.json -p:Configuration=Debug [0x7fff8a9a5160: 0.00000 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:Builder ()() [0x7fff8a9a5160: 0.00019 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:Builder ()([.ConfigBuilder:0x7fff7c402bc8] [0x7fff8a9a5160: 0.04096 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [.ConfigBuilder:0x7fff7c402bc8]) [0x7fff8a9a5160: 0.04116 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [.ConfigBuilder:0x7fff7c402bc8]) [0x7fff8a9a5160: 0.04145 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)( [0x7fff8a9a5160: 0.04146 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:.ctor (Microsoft.DotNet.HotReload.Utils.Generator.Config/ConfigBuilder)( [0x7fff8a9a5160: 0.04437 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820]) [0x7fff8a9a5160: 0.04444 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_Live ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll]) [0x7fff8a9a5160: 0.04445 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_Live ()(result=0 [0x7fff8a9a5160: 0.04500 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config)([Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner:0x7fff7c405e60] [0x7fff8a9a5160: 0.04516 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Run (System.Threading.CancellationToken)(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], [00,00,00,00,00,00,00,00,]) [0x7fff8a9a5160: 0.23871 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:Run (System.Threading.CancellationToken)([.AsyncStateMachineBox`1:0x7fff7c424560] [0x7fff738cf140: 0.26416 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:PrepareCapabilities ()(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll]) [0x7fff738cf140: 0.26422 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_EditAndContinueCapabilities ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll])

[0x7fff738cf140: 0.26446 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:.cctor ()() [0x7fff738cf140: 0.28896 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:.cctor ()( [0x7fff738cf140: 0.28897 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable1<string>)(XX) [0x7fff738cf140: 0.28912 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Tokenize (System.Collections.Generic.IEnumerable1)(XX) [0x7fff738cf140: 0.28918 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Tokenize (System.Collections.Generic.IEnumerable1<string>)((unknown return type 15) [0x7fff738cf140: 0.28940 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable1<Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser/Token>)(XX) [0x7fff738cf140: 0.29011 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable1<Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser/Token>)((unknown return type 15) [0x7fff738cf140: 0.29012 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.EditAndContinueCapabilitiesParser:Parse (System.Collections.Generic.IEnumerable1)((unknown return type 15) [0x7fff738cf140: 0.29157 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:DefaultCapabilities ()(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll]) [0x7fff738cf140: 0.29158 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:DefaultCapabilities ()(result=1023 [0x7fff738cf140: 0.29162 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_NoWarnUnknownCapabilities ()(this:0x7fff7c405820[Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll]) [0x7fff738cf140: 0.29163 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Config:get_NoWarnUnknownCapabilities ()(result=0 [0x7fff738cf140: 0.29180 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:PrepareCapabilities ()(result=1023 [0x7fff738cf140: 0.29196 0] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:SetupBaseline (Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)(this:0x7fff7c405e60[Microsoft.DotNet.HotReload.Utils.Generator.Runners.ScriptRunner Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll], 1023, [00,00,00,00,00,00,00,00,]) [0x7fff738cf140: 0.29293 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:InitMSBuild ()() [0x7fff738cf140: 1.12321 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:InitMSBuild ()( [0x7fff738cf140: 1.12342 1] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820], 1023, [00,00,00,00,00,00,00,00,]) [0x7fff738cf140: 1.12455 2] ENTER:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:PrepareMSBuildProject (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)([Microsoft.DotNet.HotReload.Utils.Generator.MsbuildConfig:0x7fff7c405820], 1023, [00,00,00,00,00,00,00,00,]) [0x7fff738cf140:] EXCEPTION handling: System.IO.FileNotFoundException: [0x7fff738cf140: 2.18797 2] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:PrepareMSBuildProject (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15) [0x7fff738cf140: 2.18885 1] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject:Make (Microsoft.DotNet.HotReload.Utils.Generator.Config,Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15) [0x7fff738cf140: 2.18978 0] LEAVE:c Microsoft.DotNet.HotReload.Utils.Generator.Runner:SetupBaseline (Microsoft.DotNet.HotReload.Utils.Generator.EnC.EditAndContinueCapabilities,System.Threading.CancellationToken)((unknown return type 15) [0x7fff738cf140:] EXCEPTION handling: Newtonsoft.Json.JsonReaderException: Unexpected character encountered while parsing value: C. Path '', line 0, position 0. [0x7fff738cf140:] EXCEPTION handling: Newtonsoft.Json.JsonReaderException: Unexpected character encountered while parsing value: C. Path '', line 0, position 0.

Reproduction Steps

  1. Cross build dotnet sdk on x86_64
  2. Native build runtime and libs.tests on any machine ppc64le/s390x or x86_64

Expected behavior

The dotnet native build should not hang

Actual behavior

The dotnet native build hangs.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

uweigand commented 2 months ago

To add some additional context, just before the hang we see this error:

Could not load signature of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile due to: Could not load file or assembly 'Microsoft.Build, Version=15.1.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies.
tmds commented 2 months ago

The issue isn't specific to ppc64le / s390 / cross-compilation. It reproduces also with building runtime tests with an x64 mono runtime on x64.

cc @lambdageek

uweigand commented 2 months ago

We've done a bit more investigation on where this error is hit. We're running

dotnet
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/build/../tools/net9.0/Microsoft.DotNet.HotReload.Utils.Generator.BuildTool.dll
-msbuild:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate.csproj
-script:/root/alhad/runtime/src/libraries/System.Runtime.Loader/tests/ApplyUpdate/System.Reflection.Metadata.ApplyUpdate.Test.FirstCallAfterUpdate/deltascript.json
-p:Configuration=Debug

which ends up calling the PrepareMSBuildProject routine here https://github.com/dotnet/hotreload-utils/blob/c667b519c4bcdf121106fac021a53d98120e825a/src/Microsoft.DotNet.HotReload.Utils.Generator/BaselineProject.cs#L23 which calls msw.OpenProjectAsync.

This creates a new "BuildHost" process running

dotnet --roll-forward LatestMajor
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.CodeAnalysis.Workspaces.MSBuild.BuildHost.dll
--property Configuration=Debug

and uses the Microsoft.CodeAnalysis.MSBuild.Rpc.RpcServer:ProcessRequestAsync method to invoke the Microsoft.CodeAnalysis.Workspaces.MSBuild.BuildHost.BuildHost:LoadProjectFileAsync routine there, which tries to create an instance of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile here: https://github.com/dotnet/roslyn/blob/fa7ece1ca25ef529db616ed6c839af9f027fa216/src/Workspaces/Core/MSBuild.BuildHost/BuildHost.cs#L180

This ends up in the Mono JIT attempting to load and initialize that type, which fails with:

Could not load signature of Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader:CreateProjectFile due to:
Could not load file or assembly 'Microsoft.Build, Version=15.1.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' or one of its dependencies.

causing

An exception of type System.TypeLoadException was thrown:
VTable setup of type Microsoft.CodeAnalysis.CSharp.CSharpProjectFileLoader failed

The Mono JIT previously had attempted to search for Microsoft.Build unsuccessfully in these paths:

/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.dll
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build.exe
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build/Microsoft.Build.dll
/root/.nuget/packages/microsoft.dotnet.hotreload.utils.generator.buildtool/9.0.0-alpha.0.24168.2/tools/net9.0/BuildHost-netcore/Microsoft.Build/Microsoft.Build.exe

Now, there's still a number of questions:

First of all, why is Microsoft.Build.dll not packaged with the microsoft.dotnet.hotreload.utils.generator.buildtool nupkg, even though a number of other Microsoft.Build assemblies are? If I'm understanding this correctly, that could be an effect of the ExcludeAssets="runtime" attribute here: https://github.com/dotnet/hotreload-utils/blob/c667b519c4bcdf121106fac021a53d98120e825a/src/Microsoft.DotNet.HotReload.Utils.Generator/Microsoft.DotNet.HotReload.Utils.Generator.csproj#L12 I'm not sure why this was added?

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK? The file is available there, but the Mono JIT doesn't even appear to look there. Not sure if this is as intended, or if there are still some Mono loader search order issues.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

Any suggestion on how to proceed here would be appreciated!

CC @giritrivedi @directhex @akoeplinger @lambdageek

janani66 commented 3 weeks ago

@YuliiaKovalova -- would you be able to take a look at this issue and advise?

YuliiaKovalova commented 3 weeks ago

cc: @rainersigwald

rainersigwald commented 3 weeks ago

First of all, why is Microsoft.Build.dll not packaged with the microsoft.dotnet.hotreload.utils.generator.buildtool nupkg

MSBuild assemblies should basically always come from the .NET SDK, not be repackaged by tools--MSBuild logic tends to make a bunch of assumptions about where .props/targets/etc files are relative to the loaded MSBuild DLLs, and it's also best to load the exact SDK that would be involved in dotnet build.

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK?

So yes. That's what MSBuildLocator does: it adds some AssemblyLoad event hooks to help find the right files in the SDK.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

The latter behavior sounds very interesting. On .NET Framework and coreclr, the load is triggered by JIT compilation of the first method containing a type from Microsoft.Build.dll. Locator thus requires that you call Register() from a method that is JITed before calling the first such method.

If Mono is more aggressive about trying to load the type (say when the class containing such a method is first seen, rather than the method itself), that might explain the problem

uweigand commented 1 week ago

Second, if Microsoft.Build.dll is not found there, should it be taken from the host .NET SDK?

So yes. That's what MSBuildLocator does: it adds some AssemblyLoad event hooks to help find the right files in the SDK.

Finally, why does this error only show up in Mono? Either it is because of the above-mentioned search order, or else it might be another instance of the problem where Mono is more eager in requiring to load types during JIT that may not actually be referenced later (where CoreCLR would defer the type loading to a later stage, which might not ever happen in this case).

The latter behavior sounds very interesting. On .NET Framework and coreclr, the load is triggered by JIT compilation of the first method containing a type from Microsoft.Build.dll. Locator thus requires that you call Register() from a method that is JITed before calling the first such method.

If Mono is more aggressive about trying to load the type (say when the class containing such a method is first seen, rather than the method itself), that might explain the problem

Thanks for the details! This does indeed look very similar to problems we've had in the past.

Here's a PR from a while back describing one of these problems, where some assembly was also supposed to the found via an event hook, but when using Mono the JIT already was trying (and failing) to load a type defined in that assembly before that hook was even installed: https://github.com/dotnet/runtime/issues/60550#issuecomment-946709185

In this case, the JIT also aborted when building a routine that merely refers to the class that contains an element of that type.

uweigand commented 1 week ago

Do you think there is a way to rewrite the msbuild code so it avoids accessing even the containing class type before the handler is set up? Or can you suggest any other way to resolve this issue?

rainersigwald commented 1 week ago

No, I don't think there's a way to rewrite MSBuild or MSBuildLocator here, unfortunately.

What we need is:

It might be possible to change microsoft.dotnet.hotreload.utils.generator.buildtool to call Locator "even earlier" (or rather call MSBuild through more class indirection) so that the call to Register() happens before the runtime tries to load MSBuild types? If there's a good pattern for that we could certainly document it for Locator users.

uweigand commented 1 week ago

Hmm, looks like there is another twist - we're dealing with two distinct dotnet processes here. The main buildtool process runs:

    public async Task<BaselineArtifacts> SetupBaseline (EnC.EditAndContinueCapabilities capabilities, CancellationToken ct = default) {
        InitMSBuild();
        BaselineProject baselineProject = await Microsoft.DotNet.HotReload.Utils.Generator.BaselineProject.Make (config, capabilities, ct);

where the first line calls

    private static void InitMSBuild ()
    {
        Microsoft.Build.Locator.MSBuildLocator.RegisterDefaults();
    }

and the second line calls

    static async Task<(EnC.ChangeMakerService, Solution, ProjectId)> PrepareMSBuildProject (Config config, EnC.EditAndContinueCapabilities capabilities, CancellationToken ct = default)
    {
                Microsoft.CodeAnalysis.MSBuild.MSBuildWorkspace msw;
[...]
                msw = Microsoft.CodeAnalysis.MSBuild.MSBuildWorkspace.Create(props);
[...]
                var project = await msw.OpenProjectAsync (config.ProjectPath, logger, null, ct);

and that last line starts a new process (as described in my comment above).

So we're calling the MSBuildLocator in the old process, but it is actually the new process where the JIT aborts due to the missing assembly. How is this supposed to work? A load hook set in the old process wouldn't be available in the new process, right? I see the MSBuildLocator also sets some environment variables, but I'm not sure I understand properly how these are intended to be used by child processes - should the child install its own load hook somewhere?

rainersigwald commented 1 week ago

ah, that is very interesting-- @jasonmalinowski does this ring any bells for you?