dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.41k stars 4.76k forks source link

Deadlock in GC when attached profiler calls ICorProfilerInfo4::EnumThreads using .NET9 runtime #110062

Open obeton opened 10 hours ago

obeton commented 10 hours ago

Description

Background: I am member of team responsible for maintaining a closed source .net profiler

Issue: I am currently working on validating .NET9 with profiler and while running regression tests, I've observed process stall when using .NET9 runtime with the profiler attached.

Root Cause: I've managed to narrow down the cause of the issue to an invocation of ICorProfilerInfo4::EnumThreads within our profiler's overridden ICorProfilerCallback::RuntimeSuspendFinished method, which I'll share in the 'actual behavior' form

Reproduction Steps

In profiler dll project have class: `class ProfilerCallback : public ICorProfilerCallback5 { public: virtual COM_METHOD(HRESULT) Initialize(IUnknown* pICorProfilerInfoUnk) { return pICorProfilerInfoUnk->QueryInterface(IID_ICorProfilerInfo8, (void**)&m_pProfilerInfo); }

COM_METHOD(HRESULT) RuntimeSuspendFinished()
{
    ICorProfilerThreadEnum* threadEnum = nullptr;
    HRESULT enumThreadsHR = m_pProfilerInfo->EnumThreads(&threadEnum);
    threadEnum->Release();
    return enumThreadsHR;
}
ICorProfilerInfo8* m_pProfilerInfo = nullptr;

}; `

Attach as profiler this (using guide Run and observe process stall/deadlock after RuntimeSuspendFinished is called

Expected behavior

Process should not deadlock

Actual behavior

Process deadlocks

This is the chain of calls that leads to a deadlock:

  1. Inside RuntimeSuspendFinished, EnumThreads is called
  2. Via StateHolder destructor in ProfilerThreadEnum::Init, ThreadStore::s_pThreadStore->m_HoldingThread is set to NULL
  3. RuntimeSuspendFinished returns
  4. WKS::GCHeap::GarbageCollectGeneration is called which then calls Thread::RareDisablePreemptiveGC
  5. Where there is a call made to ThreadStore::HoldingThreadStore will return true as m_HoldingThread == NULL, which avoids early exit from Thread::RareDisablePreemptiveGC
  6. Later on a call to GCHeapUtilities::GetGCHeap()->WaitUntilGCComplete() is made which will deadlock as GCHeap::SetWaitForGCEvent() hasn't been called yet by ThreadSuspend::RestartEE

Below are the relevant stack traces:

Resetting holding thread: coreclr.dll!ThreadSuspend::UnlockThreadStore(int bThreadDestroyed, ThreadSuspend::SUSPEND_REASON) Line 1934 C++ [Inline Frame] coreclr.dll!ThreadStore::UnlockThreadStore() Line 5110 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::Release() Line 359 C++ [Inline Frame] coreclr.dll!StateHolder<&ThreadStore::LockThreadStore,&ThreadStore::UnlockThreadStore>::{dtor}() Line 340 C++ coreclr.dll!ProfilerThreadEnum::Init() Line 585 C++ coreclr.dll!ProfToEEInterfaceImpl::EnumThreads(ICorProfilerThreadEnum * * ppEnum) Line 10366 C++ CoreRewriter_x64.dll!ProfilerCallback::RuntimeSuspendFinished() Line 2417 C++ coreclr.dll!EEToProfInterfaceImpl::RuntimeSuspendFinished() Line 5056 C++ coreclr.dll!ProfControlBlock::DoProfilerCallbackHelper<int (__cdecl*)(ProfilerInfo *),long (__cdecl*)(EEToProfInterfaceImpl *)>(ProfilerInfo * pProfilerInfo, int(*)(ProfilerInfo *) condition, HRESULT(*)(EEToProfInterfaceImpl *) callback, HRESULT * pHR) Line 284 C++ coreclr.dll!ThreadSuspend::SuspendEE(ThreadSuspend::SUSPEND_REASON reason) Line 5647 C++ coreclr.dll!GCToEEInterface::SuspendEE(SUSPEND_REASON reason) Line 51 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51030 C++

Checking null holding thread: coreclr.dll!ThreadStore::HoldingThreadStore(Thread * pThread) Line 7624 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2123 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++

Deadlock: coreclr.dll!WKS::GCHeap::WaitUntilGCComplete(bool bConsiderGCStart) Line 238 C++ coreclr.dll!Thread::RareDisablePreemptiveGC() Line 2212 C++ [Inline Frame] coreclr.dll!WKS::gc_heap::disable_preemptive(bool) Line 1683 C++ coreclr.dll!WKS::GCHeap::GarbageCollectGeneration(unsigned int gen, gc_reason reason) Line 51031 C++

Regression?

We have not encountered this GC deadlock in .NET8 and presumably earlier

Known Workarounds

I haven't tried it yet but there's probably some way to force ThreadStore::s_pThreadStore->m_HoldingThread into it's previous correct state after calling EnumThreads

Configuration

.NET: 9.0.100

Processor 13th Gen Intel(R) Core(TM) i9-13900H 2.60 GHz Installed RAM 32.0 GB (31.7 GB usable) System type 64-bit operating system, x64-based processor This has also been replicated on build server. I am not aware of the specs for it.

Other information

this line causes ThreadStore::s_pThreadStore->m_HoldingThread to be set to NULL which has knock-on effects later on which lead to a deadlock.

I have minidumps, but they're pretty large to upload, so let me know if there is any way I can send them

dotnet-policy-service[bot] commented 10 hours ago

Tagging subscribers to this area: @tommcdon See info in area-owners.md if you want to be subscribed.