dotnet / msbuild

The Microsoft Build Engine (MSBuild) is the build platform for .NET and Visual Studio.
https://docs.microsoft.com/visualstudio/msbuild/msbuild
MIT License
5.22k stars 1.35k forks source link

[Bug]: error MSB3303: Could not resolve COM reference "5477469e-83b1-11d2-8b49-00a0c9b7c9c4" version 2.4. #9613

Open joperator opened 9 months ago

joperator commented 9 months ago

Issue Description

I have a solution with more than 40 .NET projects built for different target frameworks such as netstandard2.0, net472, net48, net6.0 and net7.0. One of the .csproj files contains the following reference:

<ItemGroup Condition="'$(TargetFramework)'=='net48'">
  <COMReference Include="mscoree">
    <Guid>{5477469e-83b1-11d2-8b49-00a0c9b7c9c4}</Guid>
    <VersionMajor>2</VersionMajor>
    <VersionMinor>4</VersionMinor>
    <Lcid>0</Lcid>
    <WrapperTool>tlbimp</WrapperTool>
    <Isolated>False</Isolated>
    <EmbedInteropTypes>True</EmbedInteropTypes>
  </COMReference>
</ItemGroup>

The solution is built in an Azure DevOps Pipeline on self-hosted Windows agents with MSBuild. The Windows machine hosts multiple Azure Pipelines Agents. If only one or two of them are enabled, the build always succeeds. However, if all six of them are enabled and used concurrently, the build regularly fails with the following error:

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Current\Bin\Microsoft.Common.CurrentVersion.targets(2992,5): error MSB3303: Could not resolve COM reference "5477469e-83b1-11d2-8b49-00a0c9b7c9c4" version 2.4. The specified image file did not contain a resource section. (Exception from HRESULT: 0x80070714)

The error message isn't useful because the specified mscoree image file hasn't changed between a successful and an unsuccessful build, so it should contain the required resource section.

Steps to Reproduce

The solution is build with the following invocation from a Python script:

C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Current\Bin\MSBuild.exe [PATH TO SOLUTION] /maxCpuCount /property:Configuration=Release /property:Platform=Any CPU /restore /restoreProperty:RestoreConfigFile=[PATH TO NUGET.CONFIG];RestoreNoCache=true

Expected Behavior

The builds should succeed regardless of how many agents are enabled on the Windows machine.

Actual Behavior

With too many agents, e.g. six, enabled and used concurrently on the Windows machine, the build regularly fails with error MSB3303.

Analysis

The error message that the specified mscoree image file did not contain a resource section indicates that MSBuild selects the wrong image file or is unable to determine whether it contains a resource section when multiple builds are running concurrently on the same Windows machine. Perhaps the image file is locked by an MSBuild process while another one is trying to find its resource section. The resulting IOException could then be caught to prevent an abort and replaced with an error message stating that the image file did not contain a resource section, although it was just not possible to determine if this was the case.

Versions & Configurations

C:\Program Files\Microsoft Visual Studio\2022\Enterprise>msbuild -version
MSBuild version 17.8.3+195e7f5a3 for .NET Framework
17.8.3.51904

Visual Studio Enterprise 2022 version 17.8.3 is installed on the Windows machine. The Windows edition is Windows Server 2022 Standard version 21H2.

dotnet is also installed:

C:\Users\Administrator>dotnet --info
.NET SDK:
 Version:           8.0.100
 Commit:            57efcf1350
 Workload version:  8.0.100-manifests.6a1e483a

Runtime Environment:
 OS Name:     Windows
 OS Version:  10.0.20348
 OS Platform: Windows
 RID:         win-x64
 Base Path:   C:\Program Files\dotnet\sdk\8.0.100\

.NET workloads installed:
 Workload version: 8.0.100-manifests.6a1e483a
There are no installed workloads to display.

Host:
  Version:      8.0.0
  Architecture: x64
  Commit:       5535e31a71

.NET SDKs installed:
  6.0.411 [C:\Program Files\dotnet\sdk]
  7.0.305 [C:\Program Files\dotnet\sdk]
  7.0.403 [C:\Program Files\dotnet\sdk]
  8.0.100 [C:\Program Files\dotnet\sdk]

.NET runtimes installed:
  Microsoft.AspNetCore.App 6.0.19 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.8 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.13 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 6.0.19 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.8 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.13 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.NETCore.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.NETCore.App]
  Microsoft.WindowsDesktop.App 6.0.19 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 6.0.25 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.8 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.13 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 7.0.14 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]
  Microsoft.WindowsDesktop.App 8.0.0 [C:\Program Files\dotnet\shared\Microsoft.WindowsDesktop.App]

Other architectures found:
  x86   [C:\Program Files (x86)\dotnet]
    registered at [HKLM\SOFTWARE\dotnet\Setup\InstalledVersions\x86\InstallLocation]

Environment variables:
  Not set

global.json file:
  Not found
ladipro commented 9 months ago

@joperator, what does typelib ID {5477469e-83b1-11d2-8b49-00a0c9b7c9c4} resolve to on the problematic machine? Do any of the projects you're building write the file or the relevant registration?

Perhaps the image file is locked by an MSBuild process while another one is trying to find its resource section.

I've tried locking the .tlb and 1) its location makes it hard to lock it for writing, 2) locking it for reading does not seem to be causing the exception you're seeing.

joperator commented 9 months ago

According to the .ResolveComReference.cache file, the typelib ID resolves to C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoree.tlb, which is also the same path that is in the registry. The projects don't write the file (date modified: 08.05.2021) or the relevant registration. The COM reference is only used to create an instance of the CorRuntimeHost coclass from the mscoree namespace to call the GetDefaultDomain method.

Another detail that might be relevant: The projects that are built are located on a different drive (D:) than the type library mscoree.tlb (C:).

KalleOlaviNiemitalo commented 9 months ago

It looks like ResolveComReference.Execute passes only the Exception.Message to the logging function, so the MSBUILDDIAGNOSTICS environment variable won't make it show the stack trace from which the exception was thrown.

https://github.com/dotnet/msbuild/blob/195e7f5a3a8e51c37d83cd9e54cb99dc3fc69c22/src/Tasks/ResolveComReference.cs#L432-L437

Are multiple agents using the same TEMP directory in the same computer? Perhaps the interop assembly is generated there and parallel accesses cause a conflict.

joperator commented 9 months ago

All agents are running under the same user account, so I assume they are using the same TEMP directory. If they want to create an interop assembly in the same location at the same time, it's likely that they cause a conflict.

ladipro commented 9 months ago

I believe that interop assemblies should be created in the per-project/config/TFM intermediate directory. Do you think you can configure the builds to produce binlogs (/bl) to analyze the builds next time this happens?

joperator commented 9 months ago

Sure, if it helps to analyze the issue, I'll give binlogs a try...

joperator commented 8 months ago

@ladipro I now have a binlog from a failed build. For privacy reasons, I had to copy the subtree of the failed ResolveComReferences target and replace all private information. The failing project that has the COM reference to mscoree is now called MyProject in the MyProject.log.

ladipro commented 8 months ago

Thank you. The error is thrown after all the

Processing COM reference "mscoree" from path "C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoree.tlb". Type '<typename>' imported.

log output, which confirms that the task was able to read the .tlb and the error is really related to the interop assembly (the file written by the task). I guess this brings us back to Kalle's suspicion that multiple builds are racing to write the same file. Where is the interop assembly generated when the build succeeds?

KalleOlaviNiemitalo commented 8 months ago

Process Monitor could be helpful for logging any STATUS_SHARING_VIOLATION or STATUS_ACCESS_DENIED errors during the build.

joperator commented 8 months ago

Where is the interop assembly generated when the build succeeds?

When the build succeeds, the binlog contains the following lines instead:

...
Processing COM reference "mscoree" from path "C:\Windows\Microsoft.NET\Framework\v4.0.30319\mscoree.tlb". Type 'TypeNameFactory' imported.
Resolved COM reference for item "mscoree": "obj\Release\net48\Interop.mscoree.dll".
ladipro commented 8 months ago

Assuming this is really a project-relative directory and there's no way multiple agents can access the same path, I'm afraid this will require some instrumentation to figure out what's holding the file locked. Anti-virus software tends to be problematic so maybe one random idea is to try disabling it if present.

joperator commented 8 months ago

Assuming this is really a project-relative directory and there's no way multiple agents can access the same path, ...

Yes, I also assume that.

Anti-virus software tends to be problematic so maybe one random idea is to try disabling it if present.

No anti-virus software is present on the affected system, or I don't have sufficient permissions to see it, but I really don't think there is any anti-virus software other than the default Windows security tools installed. So follow Kalle's advice and give Process Monitor a try?

ladipro commented 8 months ago

So follow Kalle's advice and give Process Monitor a try?

Yes, that's probably the easiest thing to do now.