dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.21k stars 1.35k forks source link

`UsingTask` fails randomly on Ubuntu x64, only reboot helps #8031

Open premun opened 2 years ago

premun commented 2 years ago

Issue Description

Running on .NET 7 RC 1, Ubuntu x64 VM, I have intermittent errors - no changes to code and sometimes it works, sometimes it doesn't. If I get into this state, nuking artifacts and .dotnet does not help, only reboot seems to help.

I have a very basic setup with custom MSBuild task:

I get following error:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: The "Microsoft.DotNet.VirtualMonoRepo.Tasks.VirtualMonoRepo_Initialize" task could not be loaded from the assembly /home/prvysoky/installer/artifacts/bin/VirtualMonoRepo.Tasks/Debug/net7.0/VirtualMonoRepo.Tasks.dll. Culture is not supported. (Parameter 'name')
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: vider>b__30_1 is an invalid culture identifier. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

Steps to Reproduce

It's not a 100% repro but the commands needed are this:

git clone https://github.com/premun/installer
git checkout prvysoky/submodules
cd installer
./build.sh /p:InitializeVMR=true /p:TmpDir=/data/tmp /p:VmrDir=/data/vmr /bl

Either it fails very fast or it says "Initializing empty VMR..." and then it means the task was loaded properly.

Expected Behavior

Task should be loaded from the assembly without problems.

Actual Behavior

MSBuild fails to load the custom task with the following error:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: The "Microsoft.DotNet.VirtualMonoRepo.Tasks.VirtualMonoRepo_Initialize" task could not be loaded from the assembly /home/prvysoky/installer/artifacts/bin/VirtualMonoRepo.Tasks/Debug/net7.0/VirtualMonoRepo.Tasks.dll. Culture is not supported. (Parameter 'name')
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: vider>b__30_1 is an invalid culture identifier. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

Analysis

Sometimes I get this:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018: System.InvalidOperationException: A suitable constructor for type 't.han' could not be located. Ensure the type is concrete and services are registered for all parameters of a public constructor.

sometimes this:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: vider>b__30_1 is an invalid culture identifier. 

Seems like it's getting pieces of the source file and using it as the culture?

Reboot seems to help always.

Versions & Configurations

premun commented 2 years ago

It seems like the culture it's trying to find is a random excerpt from the tasks's .cs file. I tried today with a bit different file and am getting this:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018: The "VirtualMonoRepo_Initialize" task failed unexpectedly.
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018: System.InvalidOperationException: A suitable constructor for type 't.han' could not be located. Ensure the type is concrete and services are registered for all parameters of a public constructor.
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteFactory.CreateConstructorCallSite(ResultCache lifetime, Type serviceType, Type implementationType, CallSiteChain callSiteChain)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteFactory.TryCreateExact(ServiceDescriptor descriptor, Type serviceType, CallSiteChain callSiteChain, Int32 slot)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteFactory.TryCreateExact(Type serviceType, CallSiteChain callSiteChain)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteFactory.CreateCallSite(Type serviceType, CallSiteChain callSiteChain)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceLookup.CallSiteFactory.GetCallSite(Type serviceType, CallSiteChain callSiteChain)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceProvider.CreateServiceAccessor(Type serviceType)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at System.Collections.Concurrent.ConcurrentDictionary`2.GetOrAdd(TKey key, Func`2 valueFactory)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(Type serviceType, ServiceProviderEngineScope serviceProviderEngineScope)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceProvider.GetService(Type serviceType)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService(IServiceProvider provider, Type serviceType)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Extensions.DependencyInjection.ServiceProviderServiceExtensions.GetRequiredService[T](IServiceProvider provider)
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.DotNet.VirtualMonoRepo.Tasks.VirtualMonoRepo_Initialize.ExecuteAsync() in /home/prvysoky/installer/src/VirtualMonoRepo/Tasks/VirtualMonoRepo_Initialize.cs:line 51
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.DotNet.VirtualMonoRepo.Tasks.VirtualMonoRepo_Initialize.Execute() in /home/prvysoky/installer/src/VirtualMonoRepo/Tasks/VirtualMonoRepo_Initialize.cs:line 47
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskExecutionHost.Microsoft.Build.BackEnd.ITaskExecutionHost.Execute()
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018:    at Microsoft.Build.BackEnd.TaskBuilder.ExecuteInstantiatedTask(ITaskExecutionHost taskExecutionHost, TaskLoggingContext taskLoggingContext, TaskHost taskHost, ItemBucket bucket, TaskExecutionMode howToExecuteTask)

Again, a reboot helped..

premun commented 2 years ago

This time I got:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4018: System.InvalidOperationException: A suitable constructor for type '��.�◄u♥>q☺.↕{♥>♥�↕!☺.↕�♥.↕�♥>-☻0¶�' could not be located. Ensure the type is concrete and services are registered for all parameters of a public constructor.
rainersigwald commented 2 years ago

Well, that is a terrifying error! Do the VMR tasks call out to any native code, or are they all managed? I don't recognize the symptoms.

rainersigwald commented 2 years ago

If I get into this state, nuking artifacts and .dotnet does not help, only reboot seems to help.

What about killall dotnet (or similar)?

rokonec commented 2 years ago

Killing all dotnets helped. I will try to repro it locally with @premun help, as it might be caused MSBuild persistent process.

premun commented 2 years ago

@rainersigwald the library is a pretty simple C# only code (couple of hundred lines so far). The library eventually calls into Libgit2sharp which is a wrapper around git C code but it doesn't look like it gets anywhere near there. Outside of that it's almost a no-dependency net7.0 library.

Killing helps so at least I have a faster remediation. This breaks between runs with no code changes sometimes too. Seems like something around how UsingTask loads the DLL

rokonec commented 1 year ago

@premun Can you please collect memory dump of that corrupted MSBuild server process and send it to me - preferably by some secure way. Steps: 1) corrupt msbuild server process, by your repro steps 1) find process id of msbuild server by ps -u | grep nodemode:8 1) collect dump by ~/.dotnet/shared/Microsoft.NETCore.App/7.0.0-rc.1.22411.12/createdump -p {serverPID} - please note that createdump is best to be invoked from same runtime using by the processes to be dumped

premun commented 1 year ago

I checked out https://github.com/dotnet/installer/commit/e2560aa7788a810914e0c2ace94ac85e8df9ad9b ran the steps described above and first attempt ended up with a crash:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: The "Microsoft.DotNet.VirtualMonoRepo.Tasks.VirtualMonoRepo_Initialize" task could not be loaded from the assembly /home/prvysoky/installer/artifacts/bin/VirtualMonoRepo.Tasks/Debug/net7.0/VirtualMonoRepo.Tasks.dll. Culture is not supported. (Parameter 'name')
/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(75,5): error MSB4062: tializer>5__1 is an invalid culture identifier. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

The dump was shared over Teams

premun commented 1 year ago

Not sure if this is related but just got this on 8.0.100-alpha.1.22423.9 and it looks very similar:

/home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(67,5): error MSB4061: The "VirtualMonoRepo_Initialize" task could not be instantiated from "/home/prvysoky/installer/artifacts/bin/VirtualMonoRepo.Tasks/Debug/net7.0/VirtualMonoRepo.Tasks.dll". No parameterless constructor defined for type '�☺�.lVersionAttribute'. /home/prvysoky/installer/src/VirtualMonoRepo/InitializeVMR.proj(67,5): error MSB4060: The "VirtualMonoRepo_Initialize" task has been declared or used incorrectly, or failed during construction. Check the spelling of the task name and the assembly name. 0 Warning(s) 2 Error(s)

Time Elapsed 00:00:03.06 Build failed with exit code 1. Check errors above.

$ ./.dotnet/dotnet --info
.NET SDK:
 Version:   8.0.100-alpha.1.22423.9
 Commit:    b9635390c8

Runtime Environment:
 OS Name:     ubuntu
 OS Version:  20.04
 OS Platform: Linux
 RID:         ubuntu.20.04-x64
 Base Path:   /home/prvysoky/installer/.dotnet/sdk/8.0.100-alpha.1.22423.9/

Host:
  Version:      7.0.0
  Architecture: x64
  Commit:       d099f075e4

.NET SDKs installed:
  8.0.100-alpha.1.22423.9 [/home/prvysoky/installer/.dotnet/sdk]
rokonec commented 1 year ago

We thought that something have caused memory damage. However, when we took memory dump, heap seemed to be valid. This issue, will be hard-to-impossible to solve without local repro and debugging. I have been able to reproduce it locally on my WSL2 linux ubuntu.

JanKrivanek commented 1 year ago

Not sure if this might be related (other than both complaining about UsingTask) and of any help, but:

src\Tests\xunit-runner\XUnitRunner.targets(78,5): error MSB4062: (NETCORE_ENGINEERING_TELEMETRY=Build) The "SDKCustomCreateXUnitWorkItemsWithTestExclusion" task could not be loaded from the assembly D:\a\1\s\artifacts\\bin\HelixTasks\Debug\netcoreapp3.1\HelixTasks.dll. Could not load file or assembly 'D:\a\1\s\artifacts\bin\HelixTasks\Debug\netcoreapp3.1\HelixTasks.dll'. The system cannot find the path specified. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

Experienced by this build: https://github.com/dotnet/sdk/runs/10531706678 It is actually hit on the Windows FullFW build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=130708&view=logs&j=adc369b2-ee17-52c3-72b2-7129c9e8cda1&t=8a7f3a61-b981-59c3-9f63-0bb654fee695&l=404 and Windows build: https://dev.azure.com/dnceng-public/public/_build/results?buildId=130708&view=logs&j=fa59fe4e-195c-56fa-189b-58fd241f10dd&t=71146b80-38e1-5fea-9b74-ba1045aac3e1 But it doesn't show up in Ubuntu nor Darwin runs of the same build - so I'm wondering if some sort of platform dependent code mismatch might be culprit here?

premun commented 1 year ago

@JanKrivanek this looks a bit different, I'd say. The one above usually says something about "constructor not found on type __" and then some random characters.

premun commented 1 year ago

I have been able to reproduce it locally on my WSL2 linux ubuntu.

@rokonec did you mean "not able"? For us, the occurrence was frequent enough that we had to disable the server in all Arcade and other builds as it was happening like every third Linux build. MattGal knows more about this disabling possibly.

rokonec commented 1 year ago

Unassigning as not actionable for now. If this issue reapers please contact me and assign it back to me.

premun commented 1 year ago

@rokonec I am only afraid that we have disabled the MSBuild server feature in many places so it would make sense if you could search for those and re-enable in case you think this is not happening anymore?

If it's not fixed, we will see it immediately all over the place as it was very frequent and it's better if we catch that ourselves rather than outside customers.

rokonec commented 1 year ago

@premun Can you please try to use 8.0.100-alpha.1.23107.3 I have build it just recently and it have MSBuild server open by default. It is based on 8.0.100-alpha.1.23061.8

premun commented 1 year ago

@rokonec how do we set the MSBuild version? I thought it sort of comes with the SDK?

I vaguely remember someone (I think it was @MattGal?) disabling the server behaviour with some env variable so even if it's on by default, we might have it turned off in our infra. You'll probably know which and can check arcade / runtime for occurrences. Just to make sure we have coverage for that.

MattGal commented 1 year ago

@premun I think the issue you were asking about was https://github.com/dotnet/msbuild/issues/7870 and the variable is DOTNET_CLI_DO_NOT_USE_MSBUILD_SERVER. That said I don't see that symptom in the logs above.

rokonec commented 1 year ago

@premun 8.0.100-alpha.1.23107.3 is version of specially build SDK which has MSBuild server on by default. If you can experimentally change your code code to use this SDK version for tooling, we can check if this problem reemerge.

premun commented 1 year ago

@rokonec I am unfortunately no longer running the code path that was giving me this error manually, so wouldn't be able to verify.

I realize it might have been different issue with MSBuild server because of which we turned it off but regardless my point was only that if Roman wants to close this issue and wait if it happens again and re-open, he should be aware we might have the server behaviour turned off in our infra so we're not dogfooding it much.

ViktorHofer commented 1 year ago

Note that this also just happened in my PR but offline on Ubuntu: https://github.com/dotnet/source-build-reference-packages/pull/547

Unsure if that's noteworthy, but I'm using a compiled regex via the regex source generator. The problem disappeared after a killall dotnet but then re-appeared after a few builds.

Example error:

/home/vihofer/git/source-build-reference-packages/src/referencePackageSourceGenerator/ReferencePackageSourceGenerator.proj(79,5): error MSB4062: The "GetPackageItems" task could not be loaded from the assembly /home/vihofer/git/source-build-reference-packages/artifacts/bin/ReferencePackageSourceTask/Debug/net8.0/ReferencePackageSourceTask.dll. Culture is not supported. (Parameter 'name')
/home/vihofer/git/source-build-reference-packages/src/referencePackageSourceGenerator/ReferencePackageSourceGenerator.proj(79,5): error MSB4062: items>b__32_0 is an invalid culture identifier. Confirm that the <UsingTask> declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.