Background: I am member of team responsible for maintaining a closed source .net profiler
Issue: I am currently working on validating .NET9 with profiler and while running regression tests, I've observed process stall when using .NET9 runtime with the profiler attached.
Root Cause: I've managed to narrow down the cause of the issue to an invocation of ICorProfilerInfo4::EnumThreads within our profiler's overridden ICorProfilerCallback::RuntimeSuspendFinished method, which I'll share in the 'actual behavior' form
Reproduction Steps
In profiler dll project have class:
`class ProfilerCallback : public ICorProfilerCallback5
{
public:
virtual COM_METHOD(HRESULT) Initialize(IUnknown* pICorProfilerInfoUnk)
{
return pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo8, (void**)&m_pProfilerInfo);
}
Attach as profiler this (using guide
Run and observe process stall/deadlock after RuntimeSuspendFinished is called
Expected behavior
Process should not deadlock
Actual behavior
Process deadlocks
This is the chain of calls that leads to a deadlock:
Inside RuntimeSuspendFinished, EnumThreads is called
Via StateHolder destructor in ProfilerThreadEnum::Init, ThreadStore::s_pThreadStore->m_HoldingThread is set to NULL
RuntimeSuspendFinished returns
WKS::GCHeap::GarbageCollectGeneration is called which then calls Thread::RareDisablePreemptiveGC
Where there is a call made to ThreadStore::HoldingThreadStore will return true as m_HoldingThread == NULL, which avoids early exit from Thread::RareDisablePreemptiveGC
Later on a call to GCHeapUtilities::GetGCHeap()->WaitUntilGCComplete() is made which will deadlock as GCHeap::SetWaitForGCEvent() hasn't been called yet by ThreadSuspend::RestartEE
Below are the relevant stack traces:
Resetting holding thread:
coreclr.dll!ThreadSuspend::UnlockThreadStore(int bThreadDestroyed, ThreadSuspend::SUSPEND_REASON) Line 1934 C++ [Inline Frame] coreclr.dll!ThreadStore::UnlockThreadStore() Line 5110 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::Release() Line 359 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::{dtor}() Line 340 C++ coreclr.dll!ProfilerThreadEnum::Init() Line 585 C++ coreclr.dll!ProfToEEInterfaceImpl::EnumThreads(ICorProfilerThreadEnum * * ppEnum) Line 10366 C++ CoreRewriter_x64.dll!ProfilerCallback::RuntimeSuspendFinished() Line 2417 C++ coreclr.dll!EEToProfInterfaceImpl::RuntimeSuspendFinished() Line 5056 C++ coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *)>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *) callback, HRESULT * pHR) Line 284 C++ coreclr.dll!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason) Line 5647 C++ coreclr.dll!GCToEEInterface::SuspendEE(SUSPEND_REASON reason) Line 51 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51030 C++
Checking null holding thread:
coreclr.dll!ThreadStore::HoldingThreadStore(Thread * pThread) Line 7624 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2123 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Deadlock:
coreclr.dll!WKS::GCHeap::WaitUntilGCComplete(bool bConsiderGCStart) Line 238 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2212 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Regression?
We have not encountered this GC deadlock in .NET8 and presumably earlier
Known Workarounds
I haven't tried it yet but there's probably some way to force ThreadStore::s_pThreadStore->m_HoldingThread into it's previous correct state after calling EnumThreads
Configuration
.NET: 9.0.100
Processor 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz Installed RAM 32.0 GB (31.7 GB usable) System type 64-bit operating system, x64-based processor
This has also been replicated on build server. I am not aware of the specs for it.
Other information
this line causes ThreadStore::s_pThreadStore->m_HoldingThread to be set to NULL which has knock-on effects later on which lead to a deadlock.
I have minidumps, but they're pretty large to upload, so let me know if there is any way I can send them
Description
Background: I am member of team responsible for maintaining a closed source .net profiler
Issue: I am currently working on validating .NET9 with profiler and while running regression tests, I've observed process stall when using .NET9 runtime with the profiler attached.
Root Cause: I've managed to narrow down the cause of the issue to an invocation of
ICorProfilerInfo4::EnumThreads
within our profiler's overriddenICorProfilerCallback::RuntimeSuspendFinished
method, which I'll share in the 'actual behavior' formReproduction Steps
In profiler dll project have class: `class ProfilerCallback : public ICorProfilerCallback5 { public: virtual COM_METHOD(HRESULT) Initialize(IUnknown* pICorProfilerInfoUnk) { return pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo8, (void**)&m_pProfilerInfo); }
}; `
Attach as profiler this (using guide Run and observe process stall/deadlock after
RuntimeSuspendFinished
is calledExpected behavior
Process should not deadlock
Actual behavior
Process deadlocks
This is the chain of calls that leads to a deadlock:
RuntimeSuspendFinished
,EnumThreads
is calledStateHolder
destructor inProfilerThreadEnum::Init
,ThreadStore::s_pThreadStore->m_HoldingThread
is set to NULLRuntimeSuspendFinished
returnsWKS::GCHeap::GarbageCollectGeneration
is called which then callsThread::RareDisablePreemptiveGC
ThreadStore::HoldingThreadStore
will return true asm_HoldingThread
== NULL, which avoids early exit fromThread::RareDisablePreemptiveGC
GCHeapUtilities::GetGCHeap()->WaitUntilGCComplete()
is made which will deadlock asGCHeap::SetWaitForGCEvent()
hasn't been called yet byThreadSuspend::RestartEE
Below are the relevant stack traces:
Resetting holding thread:
coreclr.dll!ThreadSuspend::UnlockThreadStore(int bThreadDestroyed, ThreadSuspend::SUSPEND_REASON) Line 1934 C++ [Inline Frame] coreclr.dll!ThreadStore::UnlockThreadStore() Line 5110 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::Release() Line 359 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::{dtor}() Line 340 C++ coreclr.dll!ProfilerThreadEnum::Init() Line 585 C++ coreclr.dll!ProfToEEInterfaceImpl::EnumThreads(ICorProfilerThreadEnum * * ppEnum) Line 10366 C++ CoreRewriter_x64.dll!ProfilerCallback::RuntimeSuspendFinished() Line 2417 C++ coreclr.dll!EEToProfInterfaceImpl::RuntimeSuspendFinished() Line 5056 C++ coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *)>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *) callback, HRESULT * pHR) Line 284 C++ coreclr.dll!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason) Line 5647 C++ coreclr.dll!GCToEEInterface::SuspendEE(SUSPEND_REASON reason) Line 51 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51030 C++
Checking null holding thread:
coreclr.dll!ThreadStore::HoldingThreadStore(Thread * pThread) Line 7624 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2123 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Deadlock:
coreclr.dll!WKS::GCHeap::WaitUntilGCComplete(bool bConsiderGCStart) Line 238 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2212 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++
Regression?
We have not encountered this GC deadlock in .NET8 and presumably earlier
Known Workarounds
I haven't tried it yet but there's probably some way to force
ThreadStore::s_pThreadStore->m_HoldingThread
into it's previous correct state after callingEnumThreads
Configuration
.NET: 9.0.100
Processor 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz Installed RAM 32.0 GB (31.7 GB usable) System type 64-bit operating system, x64-based processor
This has also been replicated on build server. I am not aware of the specs for it.Other information
this line causes
ThreadStore::s_pThreadStore->m_HoldingThread
to be set to NULL which has knock-on effects later on which lead to a deadlock.I have minidumps, but they're pretty large to upload, so let me know if there is any way I can send them