dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.24k stars 1.35k forks source link

"GenerateResource" task failed unexpectedly due to System.Runtime.Remoting.RemotingExcpetion: Object 'xxx.rem' has been disconnected or does not exist at the server #11003

Open adc-cjewett opened 1 week ago

adc-cjewett commented 1 week ago

Issue Description

We're running into an issue with a large .NET Framework project that runs into the following error intermittently:

MSBuild version 17.11.2+c078802d4 for .NET Framework
17.11.2.32701
"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\amd64\MSBuild.exe"  /nologo Solution.sln /m /p:Configuration=Debug /p:Platform=Any Cpu /v:minimal /clp:ErrorsOnly /nodeReuse:false /graph /p:RestoreUseStaticGraphEvaluation=true /p:CreateHardLinksForCopyFilesToOutputDirectoryIfPossible=true /p:CreateHardLinksForCopyAdditionalFilesIfPossible=true /p:CreateHardLinksForCopyLocalIfPossible=true /p:CreateHardLinksForPublishFilesIfPossible=true /p:PublicRelease=true /t:build /restore
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\amd64\Microsoft.Common.CurrentVersion.targets(3413,5): error MSB4018: The "GenerateResource" task failed unexpectedly.
12-Nov-2024 12:32:44    System.Runtime.Remoting.RemotingException: Object '/aabc637d_11e9_4e41_b7db_ab514efbed9f/xlxmgbmsdebok3qn1uvb0snm_4.rem' has been disconnected or does not exist at the server.
12-Nov-2024 12:32:44       at Microsoft.Build.Tasks.ProcessResourceFiles.get_StronglyTypedClassName()
12-Nov-2024 12:32:44       at Microsoft.Build.Tasks.GenerateResource.Execute()
12-Nov-2024 12:32:44       at Microsoft.Build.BackEnd.TaskExecutionHost.Execute()
12-Nov-2024 12:32:44       at Microsoft.Build.BackEnd.TaskBuilder.<ExecuteInstantiatedTask>d__26.MoveNext() [C:\Project.csproj]

This project is using the older csproj format and is on the larger side with a bunch of resource files. We've been struggling to determine how to figure out the root cause of the problem though. Anyone have ideas of what the issue is or what logs we can turn on to investigate further? We've turned on Roslyn compiler logs and didn't see anything too crazy to the untrained eye. This seems like a relevant issue but not a lot of content there: https://github.com/dotnet/msbuild/issues/6770

Steps to Reproduce

We can't provide reproduce steps unfortunately since we see it intermittently in a very large internal project.

Expected Behavior

We expect the compilation and the GenerateResource task to succeed.

Actual Behavior

The GenerateResource task will fail intermittently with an error that is a little difficult for those unfamiliar with MSbuild internals to understand.

Analysis

No response

Versions & Configurations

No response

KalleOlaviNiemitalo commented 1 week ago

ProcessResourceFiles seems to derive from MarshalByRefObject on .NET Framework:

https://github.com/dotnet/msbuild/blob/8f6b8ad0ace90c777c66711c907227fcfb6f2efe/src/Tasks/GenerateResource.cs#L2212-L2215

The GenerateResource task doesn't seem to be doing anything about leases for cross-appdomain remoting. If the operation lasts longer than the timeouts configured in LifetimeServices, then perhaps .NET Framework disconnects the ProcessResourceFiles instance from remoting. This might be fixable by making ProcessResourceFiles override the MarshalByRefObject.InitializeLifetimeService method to return null. The caller will unload the whole AppDomain anyway, so the lease mechanism is not useful for ProcessResourceFiles.

adc-cjewett commented 1 week ago

We did turn on the /graph option recently when running builds which improved our compilation performance significantly. Any chance that option has a direct impact on this?

We're assuming that the optimized resource usage as a result of /graph being enabled is indirectly impacting it because the machine is being utilized more heavily as a result of optimized building, but figured I'd double check there is no direct tie.

KalleOlaviNiemitalo commented 1 week ago

dotnet msbuild would not have this problem, because .NET Core does not support remoting and MSBuild does not attempt to use it there. It might be incompatible with your projects for other reasons, though.

KalleOlaviNiemitalo commented 1 week ago

Setting ResGenExecuteAsTool=true might work around the problem, by making the GenerateResource task run ResGen.exe as a child process instead of using cross-AppDomain remoting. It could be slower, though.

adc-cjewett commented 1 week ago

dotnet msbuild would not have this problem, because .NET Core does not support remoting and MSBuild does not attempt to use it there. It might be incompatible with your projects for other reasons, though.

Thanks! Understood.

Setting ResGenExecuteAsTool=true might work around the problem, by making the GenerateResource task run ResGen.exe as a child process instead of using cross-AppDomain remoting. It could be slower, though.

Will definitely take a look at this one. I've asked the folks to get me a bin log and we'll turn that option on and monitor.

adc-cjewett commented 2 days ago

I was able to open the binlogs on a 300 GB RAM machine and in the failure case it would timeout after about 8 minutes 20 seconds. In the successful case it would complete processing the resx files in 2 minutes 30 seconds. Our hardware is known to be extremely inconsistent, so at the moment we're leaning towards our build hardware or something about how the VMs were setup is causing severe instability that causes resx files to take longer to process and eventually timeout.

After taking a look at that we set ResGenExecuteAsTool=true. It looks like it is solving our problem but we're only one day in, so I'll provide an update after the weekend to see if we still experience the issue.