Hang during shutdown with Native AOT on IReferenceTrackerHost::ReleaseDisconnectedReferenceSources waiting for finalizers

Sergio0694 commented 2 weeks ago

Description

We're hitting a 100% consistent hang during application shutdown, only on Native AOT. It seems that the finalizer thread and the UI thread (ASTA) are possibly in a deadlock, resulting in the application process remaining alive after closing the main window. After a few seconds, Windows proceeds to kill the process, which shows up in WER as a hang (which is expected). Only repros with Native AOT, whereas CoreCLR seems to work fine.

Reproduction Steps

I don't have a minimal repro. Please ping me on Teams for instructions on how to deploy the Store locally to repro. Alternatively, I can also share an MSIX package for sideloading, with instructions on how to install it for testing (and how to restore the retail Store after that).

Here is a memory dump on the process during the hang (process was paused from WinDbg on the presumed deadlock).

Expected behavior

The application should shutdown correctly when closing the window.

Actual behavior

Here's the two relevant stacktraces I see in WinDbg.

Finalizer thread (!FinalizerStart) (click to expand)

``` [0x0] ntdll!ZwWaitForMultipleObjects+0x14 [0x1] KERNELBASE!WaitForMultipleObjectsEx+0xe9 [0x2] combase!MTAThreadWaitForCall+0xfb [0x3] combase!MTAThreadDispatchCrossApartmentCall+0x2bc [0x4] combase!CSyncClientCall::SwitchAptAndDispatchCall+0x707 (Inline Function) (Inline Function) [0x5] combase!CSyncClientCall::SendReceive2+0x825 [0x6] combase!SyncClientCallRetryContext::SendReceiveWithRetry+0x2f (Inline Function) (Inline Function) [0x7] combase!CSyncClientCall::SendReceiveInRetryContext+0x2f (Inline Function) (Inline Function) [0x8] combase!DefaultSendReceive+0x6e [0x9] combase!CSyncClientCall::SendReceive+0x300 [0xa] combase!CClientChannel::SendReceive+0x98 [0xb] combase!NdrExtpProxySendReceive+0x58 [0xc] RPCRT4!Ndr64pSendReceive+0x39 (Inline Function) (Inline Function) [0xd] RPCRT4!NdrpClientCall3+0x3de [0xe] combase!ObjectStublessClient+0x14c [0xf] combase!ObjectStubless+0x42 [0x10] combase!CObjectContext::InternalContextCallback+0x2fd [0x11] combase!CObjectContext::ContextCallback+0x902 [0x12] !WinRT_Runtime_ABI_WinRT_Interop_IContextCallbackVftbl__ContextCallback+0x102 [0x13] !WinRT_Runtime_WinRT_Context__CallInContext+0x87 [0x14] !WinRT_Runtime_WinRT_ObjectReferenceWithContext_1__Release+0x64 [0x15] !WinRT_Runtime_WinRT_IObjectReference__Dispose+0x5b [0x16] !WinRT_Runtime_WinRT_IObjectReference__Finalize+0x17 [0x17] !S_P_CoreLib_System_Runtime___Finalizer__DrainQueue+0x7a [0x18] !S_P_CoreLib_System_Runtime___Finalizer__ProcessFinalizers+0x47 [0x19] !FinalizerStart+0x56 [0x1a] KERNEL32!BaseThreadInitThunk+0x1d [0x1b] ntdll!RtlUserThreadStart+0x28 ```

UI thread (shcore!_WrapperThreadProc ApplicationView ASTA) (click to expand)

``` [0x0] win32u!ZwUserMsgWaitForMultipleObjectsEx+0x14 [...] [0x5] combase!CoWaitForMultipleHandles+0xc2 [0x6] !PalCompatibleWaitAny+0x63 [0x7] !CLREventStatic::Wait+0xc6 [0x8] !RhWaitForPendingFinalizers+0x90 [0x9] !S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers+0x32 [0xa] !S_P_CoreLib_System_Runtime_RuntimeImports__RhWaitForPendingFinalizers_0+0x21 [0xb] !S_P_CoreLib_System_GC__WaitForPendingFinalizers+0x1b [0xc] !S_P_CoreLib_System_Runtime_InteropServices_ComWrappers__IReferenceTrackerHost_ReleaseDisconnectedReferenceSources+0x24 [0xd] Windows_UI_Xaml!DirectUI::ReferenceTrackerManager::TriggerFinalization+0x34 [...] ```

It seems that:

The window is closed
The lifecycle manager starts suspending the app
XAML decides it should finalize objects
ComWrappers::IReferenceTrackerHost::ReleaseDisconnectedReferenceSources is called
That in turn blocks on GC::WaitForPendingFinalizers
The finalizer thread gets to finalizing some ObjectReferenceWithContext<T> object
That object is from another context so it calls CallInContext (here)
That passes the callback function and invokes it in the target context (here)
That eventually just ends stuck on WaitForMultipleObjectsEx
The whole app just hangs until the OS just forcibly kills it

Some potentially relevant differences we noticed in the finalizer logic across CoreCLR (which works fine) and NativeAOT:

CoreCLR: https://github.com/dotnet/runtime/blob/302e0d4cf9d603fbc76e508b0b41e778c69f2186/src/coreclr/vm/finalizerthread.cpp#L493-L548

Native AOT: https://github.com/dotnet/runtime/blob/302e0d4cf9d603fbc76e508b0b41e778c69f2186/src/coreclr/nativeaot/Runtime/FinalizerHelpers.cpp#L115-L151

It seems that they're similar, however:

CoreCLR passes alertable: TRUE
Native AOT passes alertable: FALSE
NativeAOT also passes the allowReentrantWait: TRUE param, which makes it call CoWaitForMultipleHandles here

Not sure whether that's intentional (why?) and whether it's related to the issue, just something we noticed.

Regression?

No.

Known Workarounds

None, this is a blocker 😅

Configuration

VS 17.12 P5
.NET 9 RC2

dotnet-policy-service[bot] commented 2 weeks ago

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas See info in area-owners.md if you want to be subscribed.

Sergio0694 commented 2 weeks ago

cc. @AaronRobinsonMSFT @jkoritzinsky perhaps you might also have thoughts on this or some knowledge to share? 😄

agocke commented 2 weeks ago

Also + @VSadov in case this could be related to suspension

MichalStrehovsky commented 2 weeks ago

Also + @VSadov in case this could be related to suspension

It's not related to suspension, but definitely an area where expertise from @VSadov would help. Looks like native AOT will just wait on an event using a OS API and that's all there's to it. CoreCLR does a lot more to figure out how exactly to wait (Does the thread have a SynchronizationContext? Do we need to pump window messages with MsgWaitForMultipleObjects? etc.). As usual, this is a place that is a lot more complicated in CoreCLR VM and it's hard to say which part is relevant and how much of it we need to implement.

https://github.com/dotnet/runtime/blob/3116db9c19512d89b2091a9e5fcbec08581d0f9d/src/coreclr/vm/threads.cpp#L3201-L3484

VSadov commented 2 weeks ago

I think waiting with "alertable" is to allow a sleeping/waiting thread to react to Thread.Interrupt(). However the Thread.Interrupt() is not supported on NativeAOT

https://github.com/dotnet/runtime/blob/6d23ef4d68bbcdb38fdc22218d1073c5083ac6a1/src/coreclr/nativeaot/System.Private.CoreLib/src/System/Threading/Thread.NativeAot.Windows.cs#L402

I suspect the culprit is something with COM and message pumping, but it looks like we call CoWaitForMultipleHandles, which should do that.

jkotas commented 2 weeks ago

CoreCLR may be doing the wait via user installed synchronization context. It would be useful to find out the callstack of the wait in CoreCLR.

manodasanW commented 2 weeks ago

I did have a chat with some COM folks. On ASTA threads such as this one, CoWaitForMultipleHandles doesn't pump COM messages unless COWAIT_DISPATCH_CALLS is passed as the flag. But that isn't passed in the CoreClr case here either. So that might explain why it is waiting but doesn't explain why CoreClr doesn't hit this issue.

My initial guess was the CleanupWrappersInCurrentCtxThread call was doing some cleanup for the ASTA scenarios before it waited, but doing some debugging of the CoreCLR scenario, it seems it isn't as there is a conditional that makes it used in certain scenarios which isn't set here. And from my debugging, it seems we are doing the same wait on CoreClr similar to AOT, but with alertable set. Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening which is what that seems to control.

jkotas commented 2 weeks ago

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

MichalStrehovsky commented 2 weeks ago

Not sure if that is somehow making us get lucky to not hitting this issue due to other APC calls happening

You can build a local native AOT package that changes this to alertable=true and try to repro it with that to prove or disprove this hypothesis.

If there's something you'd like to try, here are the accelerated steps:

Clone the runtime repo, make sure you have the build prerequisites (CMake, VS with C++ workload, Python)
CD to the clone of the repo
Assuming you're targeting 9.0 with an RC2 SDK: git checkout release/9.0-rc2
build.cmd clr.aot -c Release

Then publish your project with native AOT as usual (might want to delete all of bin/obj before since this is not incremental), but set IlcSdkPath property like this <IlcSdkPath>{REPO_PATH}\artifacts\bin\coreclr\windows.x64.Release\aotsdk\</IlcSdkPath>, replacing {REPO_PATH} with where you cloned the runtime repo. This will pick up your build of runtime/corelib. The nice thing about this is that once you do this, you can set breakpoints and debug within the code. You should also be able to pass -c Debug to the build.cmd invocation to build the debug version of the runtime (it will be dropped to a similar path under artifacts) and use that instead - it's easier to debug.

MichalStrehovsky commented 3 days ago

This is not native AOT specific.

I stepped through CoreCLR VM version of this. We end up taking a path where we wait like this:

>   coreclr.dll!MsgWaitHelper(int numWaiters, void * * phEvent, int bWaitAll, unsigned long millis, int bAlertable) Line 3140   C++
    coreclr.dll!Thread::DoAppropriateAptStateWait(int numWaiters, void * * pHandles, int bWaitAll, unsigned long timeout, WaitMode mode) Line 3178  C++
    coreclr.dll!Thread::DoAppropriateWaitWorker(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, void * associatedObjectForMonitorWait) Line 3363  C++
    coreclr.dll!Thread::DoAppropriateWait(int countHandles, void * * handles, int waitAll, unsigned long millis, WaitMode mode, PendingSync * syncState) Line 3032  C++
    [Inline Frame] coreclr.dll!CLREventBase::WaitEx(unsigned long) Line 459 C++
    coreclr.dll!CLREventBase::Wait(unsigned long dwMilliseconds, int alertable, PendingSync * syncState) Line 413   C++
    coreclr.dll!FinalizerThread::FinalizerThreadWait() Line 599 C++
    coreclr.dll!InteropLibImports::WaitForRuntimeFinalizerForExternal() Line 1148   C++
    [Inline Frame] Windows.UI.Xaml.dll!DirectUI::ReferenceTrackerManager::TriggerFinalization() Line 350    C++
    Windows.UI.Xaml.dll!DirectUI::DXamlCore::OnAfterAppSuspend() Line 4155  C++
    Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::InvokeAfterAppSuspendCallback() Line 431 C++
    Windows.UI.Xaml.dll!XAML::PLM::PLMHandler::DecrementAppSuspendActivityCount() Line 226  C++

There's a difference between native AOT and CoreCLR - on CoreCLR we pass COWAIT_ALERTABLE, on native AOT we don't.

In the end it doesn't make a difference because they both deadlock.

dotnet / runtime