Open dfederm opened 2 years ago
FYI @dsplaisted
My guess based on inspection the code is that this is the race:
ResolveSdkUsingResolversWithPatternsFirst
at the same time._generalResolversManifestsRegistry
is null and enter RegisterResolversManifests
_generalResolversManifestsRegistry
inside RegisterResolversManifests
. Note that this is under a lock.RegisterResolversManifests
_generalResolversManifestsRegistry
again and starts to add to it.GetResolvers
call which passes _generalResolversManifestsRegistry
. Note! This is not under a lock!GetResolvers
starts to iterate the list passed in (_generalResolversManifestsRegistry
)_generalResolversManifestsRegistry
_generalResolversManifestsRegistry
and throws.So ultimately, the bug is that _generalResolversManifestsRegistry
is only locked when writing, but not when reading.
Code with the bug was introduced in #7597 by @AR-May
@dfederm thank you for reporting the bug and for analysis. The scenario above should not be possible. The _generalResolversManifestsRegistry
is double-checked on creation: the RegisterResolversManifests
has a check after a lock to prevent the second creation. So, thread 2 should just return from the function without rewriting the collection. After creation this collection was not supposed to be modified, so I thought it should be ok not to lock it on reading, since it happens only after a creation. I would dig into this.
I saw this on MSBuild version = "17.8.3+195e7f5a3" today
error MSB4014: The build stopped unexpectedly because of an internal failure.
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>c__DisplayClass3_0.<ResolveSdk>b__2()
at System.Lazy`1.CreateValue()
at System.Lazy`1.LazyInitValue()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
and
error MSB4014: The build stopped unexpectedly because of an internal failure.
System.InvalidOperationException: Collection was modified; enumeration operation may not execute.
at System.ThrowHelper.ThrowInvalidOperationException(ExceptionResource resource)
at System.Collections.Generic.List`1.Enumerator.MoveNextRare()
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdkUsingResolversWithPatternsFirst(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.SdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.<>c__DisplayClass3_0.<ResolveSdk>b__2()
at System.Lazy`1.CreateValue()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
--- End of stack trace from previous location where exception was thrown ---
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at System.Lazy`1.get_Value()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
at System.Lazy`1.get_Value()
at Microsoft.Build.BackEnd.SdkResolution.CachingSdkResolverService.ResolveSdk(Int32 submissionId, SdkReference sdk, LoggingContext loggingContext, ElementLocation sdkReferenceLocation, String solutionPath, String projectPath, Boolean interactive, Boolean isRunningInVisualStudio, Boolean failOnUnresolvedSdk)
at Microsoft.Build.BackEnd.SdkResolution.MainNodeSdkResolverService.PacketReceived(Int32 node, INodePacket packet)
From the source code, it looks like this could happen:
In this scenario, the problem is that, although RegisterResolversManifests locks _lockObject to prevent other threads from initializing the manifest registries in parallel, it assigns the list to _generalResolversManifestsRegistry before it has finished adding the manifests to it. Other threads can then read the list reference from _generalResolversManifestsRegistry, assume that the list will no longer be modified, and attempt to enumerate it. To fix this, RegisterResolversManifests should store the lists to local variables first, populate them there, and assign to fields only just before unlocking _lockObject. I'm not sure whether these assignments would need to be volatile
according to the memory model.
just a fyi, I ran into this again on a build today
we have seen this issue a few times as well in our builds
I'm occasionally seeing the following exception:
Seems like there is a race condition in
SdkResolverService
.MSBuild version: 17.4.0-preview-22416-02 (pretty close to head of main as of this writing)