dotnet / runtimelab

This repo is for experimentation and exploring new ideas that may or may not make it into the main dotnet/runtime repo.
MIT License
1.42k stars 198 forks source link

`EventWaitHandle` based while loop in bg thread hang on GC in macOS [NativeAOT] #1560

Closed ryancheung closed 3 years ago

ryancheung commented 3 years ago

Given a sample game project. The game has two threads, one is the main (UI) thread. another is a background thread for sending draw commands to main thread. The two thread is synchronised with EventWaitHandle.

This works out of the box in Windows, but not in macOS.

Repro repo

lldb stacktrace:

* thread dotnet/runtimelab#9
  * frame #0: 0x00007fff204b73c2 libsystem_kernel.dylib`swtch_pri + 10
    frame dotnet/runtimelab#1: 0x00007fff204ea070 libsystem_pthread.dylib`cthread_yield + 11
    frame dotnet/runtimelab#2: 0x00000001000592d9 macOSAOTRepro`::PalSwitchToThread() at PalRedhawkUnix.cpp:557:12 [opt]
    frame dotnet/runtimelab#3: 0x000000010001629d macOSAOTRepro`ThreadStore::SuspendAllThreads(this=<unavailable>, waitForGCEvent=<unavailable>) at threadstore.cpp:252:17 [opt]
    frame dotnet/runtimelab#4: 0x0000000100026fa6 macOSAOTRepro`WKS::GCHeap::GarbageCollectGeneration(this=<unavailable>, gen=0, reason=reason_alloc_soh) at gc.cpp:44852:9 [opt]
    frame dotnet/runtimelab#5: 0x000000010002853f macOSAOTRepro`WKS::gc_heap::trigger_gc_for_alloc(gen_number=<unavailable>, gr=<unavailable>, msl=<unavailable>, loh_p=<unavailable>, take_state=<unavailable>) at gc.cpp:17010:14 [opt] [artificial]
    frame dotnet/runtimelab#6: 0x0000000100029133 macOSAOTRepro`WKS::gc_heap::try_allocate_more_space(acontext=0x0000000101f83310, size=96, flags=0, gen_number=0) at gc.cpp:17152:17 [opt]
    frame dotnet/runtimelab#7: 0x000000010004b6f0 macOSAOTRepro`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) [inlined] WKS::gc_heap::allocate_more_space(acontext=0x0000000101f83310, size=96, flags=0, alloc_generation_number=0) at gc.cpp:17622:18 [opt]
    frame dotnet/runtimelab#8: 0x000000010004b6d6 macOSAOTRepro`WKS::GCHeap::Alloc(gc_alloc_context*, unsigned long, unsigned int) at gc.cpp:17653 [opt]
    frame dotnet/runtimelab#9: 0x000000010004b6b8 macOSAOTRepro`WKS::GCHeap::Alloc(this=<unavailable>, context=0x0000000101f83310, size=96, flags=0) at gc.cpp:43852 [opt]
    frame dotnet/runtimelab#10: 0x000000010000e402 macOSAOTRepro`::RhpGcAlloc(MethodTable *, uint32_t, uintptr_t, void *) [inlined] GcAllocInternal(pEEType=0x0000000100a3dca0, uFlags=0, numElements=<unavailable>, pThread=<unavailable>) at gcrhenv.cpp:267:54 [opt]
    frame dotnet/runtimelab#11: 0x000000010000e374 macOSAOTRepro`::RhpGcAlloc(pEEType=0x0000000100a3dca0, uFlags=<unavailable>, numElements=0, pTransitionFrame=<unavailable>) at gcrhenv.cpp:303 [opt]
    frame dotnet/runtimelab#12: 0x00000001000682bc macOSAOTRepro`RhpNewObject at AllocFast.S:84
    frame dotnet/runtimelab#13: 0x00000001001c0757 macOSAOTRepro`macOSAOTRepro_macOSAOTRepro_Game1__DrawCommandLoop + 55
    frame dotnet/runtimelab#14: 0x00000001001372a8 macOSAOTRepro`S_P_CoreLib_System_Threading_Thread__StartThread + 248
    frame dotnet/runtimelab#15: 0x00000001001376e9 macOSAOTRepro`S_P_CoreLib_System_Threading_Thread__ThreadEntryPoint + 25
    frame dotnet/runtimelab#16: 0x00007fff204ec8fc libsystem_pthread.dylib`_pthread_start + 224
    frame dotnet/runtimelab#17: 0x00007fff204e8443 libsystem_pthread.dylib`thread_start + 15
(lldb)
jkotas commented 3 years ago

This is likely same issue as https://github.com/dotnet/corert/issues/8308 . Could you please check whether any other is running in a tight loop?

ryancheung commented 3 years ago

The previous issue shows the tight loop without break will hang in Windows also. But this new repro works actually in Windows now because I use EventWaitHandle to do synchronization instead of state check in a tight loop. This is how I fix my game run with multitheaded rendering in Windows. But it still hang in macOS. That's why I created this issue.

ryancheung commented 3 years ago

You can try the repro in Windows and you see it won't hang. The background thread will wait until ui thread sent signal so there should be no tight loop. But it doesn't work in macOS. It still hang on GC.

jkotas commented 3 years ago

It is very similar problem, just in a different spot (the tight loop goes through the TryDequeue API).

Here is what we can do quickly to make your game work:

This change together with my fix should fix the hang.

jkotas commented 3 years ago

Please give this a try.

I have opened dotnet/runtime#67805 on the GC suspension problem.

ryancheung commented 3 years ago

Thanks! It works out of box now.