dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.21k stars 1.35k forks source link

MSBuild task hangs occasionally in Linux when invoking 'dotnet run' #9671

Open jorgensigvardsson opened 5 years ago

jorgensigvardsson commented 5 years ago

Steps to reproduce

Create a project, add a target that is invoked before target BeforeBuild. That task should then invoke <Exec Command="dotnet run -c $(Configuration) -p ../OtherProject" />.

Expected behavior

I expect the command to run and finish, so that msbuild can continue executing targets.

Actual behavior

Msbuild seemingly hangs, as if it cannot determine that OtherProject has exited. This only occurs in Linux, and only sometimes. It always hangs when I run the same task on the Ubuntu 1604 hosted agent in Azure DevOps. It sometimes hangs when I run the same task in Docker on my Windows desktop machine.

Environment data

I am using the docker image microsoft/dotnet:2.2-sdk as a base for my own image. I have stripped it down to a bare minimum with /bin/bash as ENTRYPOINT, so that I have been able to run the commands manually.

dotnet --info output:

.NET Core SDK (reflecting any global.json):
 Version:   2.2.104
 Commit:    73f036d4ac

Runtime Environment:
 OS Name:     debian
 OS Version:  9
 OS Platform: Linux
 RID:         debian.9-x64
 Base Path:   /usr/share/dotnet/sdk/2.2.104/

Host (useful for support):
  Version: 2.2.2
  Commit:  a4fd7b2c84

.NET Core SDKs installed:
  2.2.104 [/usr/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.2 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.2 [/usr/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

The "offending" target looks like this in my .csproj:

<Target Name="GenerateBuilders" BeforeTargets="BeforeBuild">
    <Message Text="Generating model builder classes..." Importance="High"/>
    <Exec Command="dotnet run -c $(Configuration) -p ../Tracy.Core.Dal.ModelBuilderGenerator Builders.cs" />
    <Message Text="Finished." Importance="High"/>
  </Target>

The project Tracy.Core.Dal.ModelBuilderGenerator is a custom project that generates code at runtime for other projects to consume. In the logs I can see all the output from the generator project. The very last output is right before return 0;.

The workaround I have now is to tag the target with Condition="'$(BuildingInsideVisualStudio)' == 'true'" so that it'll work as expected during development time. During build, I publish the tool in my Docker file to an executable, which I run before the initial dotnet invocation.

Source code access

Access to source code etc can be arranged privately if needed.

livarcocc commented 5 years ago

Can you capture a dump when this happens? Do you see the dotnet run process running without stopping when this happens.

jorgensigvardsson commented 5 years ago

I only had bash, and three dotnet processes. I don't have a ps ef output right now, but can of course provide it on Monday.

The dotnet processes were all sleeping, as if they were waiting for something. Also, IIRC, the processes were grand parent, parent and child. I believe dotnet run was the child, but I am not 100% sure about that.

The project I run is just a simple console app that don't read from stdin, it only grabs some CLR metadata that it writes to a file.

I did try to do a dotnet publish to generate a binary to execute instead, but it hung as well.

jorgensigvardsson commented 5 years ago

I just tried to reproduce the error in my own docker host, but I cannot reproduce the hanging dotnet run process. I can't get the access I need against the docker host in Azure DevOps, so I'm a bit clueless/powerless now.

MartinKarlgrenIMI commented 9 months ago

We have the same problem. We noticed that the task actually completes successfully after 15 minutes.

The DefaultNodeConnectionTimeout is 900 seconds -- possibly related?

baronfel commented 9 months ago

@ladipro /@rainersigwald is there any debugging information in MSBuild that could help diagnose if this is a node connection issue?

MartinKarlgrenIMI commented 9 months ago

Setting MSBUILDNODECONNECTIONTIMEOUT="30000" in the environment does indeed reduce the waiting time, and the task finishes successfully after 30 seconds instead of 15 minutes.

ladipro commented 9 months ago

@MartinKarlgrenIMI, with MSBUILDDEBUGCOMM set to 1, MSBuild will be dumping node communication log to files named MSBuild_CommTrace_PID_*.txt in the temp directory. Would it be possible to share these logs from a problematic build?

MartinKarlgrenIMI commented 9 months ago

@ladipro, sure, files below. (This was a build with a 30000 ms timeout, I noticed that in the *_1794.txt file the timeout is hit for one thread.)

MSBuild_CommTrace_PID_1794.txt MSBuild_CommTrace_PID_1767.txt MSBuild_CommTrace_PID_1709.txt MSBuild_CommTrace_PID_1670.txt

ladipro commented 9 months ago

It looks like ToolTask doesn't receive the Process.Exited notification if the tool process is dotnet build / dotnet run which creates a new OOP node process. Or rather, it receives it only after the node process has exited.

ladipro commented 9 months ago

Likely the same root cause as https://github.com/dotnet/sdk/issues/9452. Could be specific to AzDO environment.

ladipro commented 9 months ago

@MartinKarlgrenIMI can you please try passing the --init flag per the last couple of comments in https://github.com/dotnet/runtime/issues/27115 ?

SeijiSuenaga commented 8 months ago

@ladipro unfortunately dotnet run --init --project abc didn't fix the 15-minute hang in my case.

ladipro commented 8 months ago

@ladipro unfortunately dotnet run --init --project abc didn't fix the 15-minute hang in my case.

@SeijiSuenaga my understanding is that --init should be passed to docker, not dotnet. See https://docs.docker.com/engine/reference/commandline/container_run/#init

SeijiSuenaga commented 8 months ago

@ladipro Ah, sorry. Just tried that as well, but it still hung for 15 minutes. (In my case, the hangs are happening in GitLab CI, so I tested it by enabling their FF_USE_INIT_WITH_DOCKER_EXECUTOR feature flag.)

That said, I did find a workaround for my particular scenario. In case it helps anyone else, I found that my MSBuild target was only hanging when executing as part of dotnet test, not dotnet build for the same project. So I adjusted my CI script to run dotnet build first, then dotnet test, and now it runs completely normally. 🤔