HaxeFoundation / hxcpp

Runtime files for c++ backend for haxe
Other
298 stars 191 forks source link

GC race condition between ReclaimAsync and Mark #807

Open waneck opened 5 years ago

waneck commented 5 years ago

I've been seeing some occasional race conditions happening in Unreal.hx when running multiple external threads, which I believe are caused by the hxcpp GC. I was able to reduce the crash to this example. Tracing it further, I think I found the root cause of it. The GC is running the following stack trace:

#2  0x0000000007673c16 in GlobalAllocator::ReclaimAsync (this=0x7f08a5ef31c0, outStats=...) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:4101
#3  0x0000000007673794 in GlobalAllocator::ThreadLoop (this=0x7f08a5ef31c0, inId=1) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:4289
#4  0x0000000007673690 in GlobalAllocator::SThreadLoop (inInfo=0x1) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:4312
#5  0x00007f08b90ac6db in start_thread (arg=0x7f089b2ab700) at pthread_create.c:463
#6  0x00007f08b79a988f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

while in another thread, it is running the following:

#1  0x0000000006aeb446 in hx::MarkObjectAlloc (inPtr=0x7f08a39dbb80, __inCtx=0x7f08a5ef3228) at C:/HaxeToolkit/haxe/lib/hxcpp/git/include\hx/GC.h:494
#2  0x000000000766333e in hx::MarkConservative (inBottom=0x7ffc4b37c8c4, inTop=0x7ffc4b3848b4, __inCtx=0x7f08a5ef3228) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:5380
#3  0x0000000007673564 in LocalAllocator::Mark (this=0x7f08a1bb0600, __inCtx=0x7f08a5ef3228) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:5915
#4  0x00000000076645ed in MarkLocalAlloc (inAlloc=0x7f08a1bb0600, __inCtx=0x7f08a5ef3228) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:5996
#5  0x0000000007672ab5 in GlobalAllocator::MarkAll (this=0x7f08a5ef31c0, inGenerational=false) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:4544
#6  0x000000000766712c in GlobalAllocator::Collect (this=0x7f08a5ef31c0, inMajor=true, inForceCompact=true, inLocked=false) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:4697
#7  0x0000000007663b3f in hx::InternalCollect (inMajor=true, inCompact=true) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/Immix.cpp:6166
#8  0x000000000765f366 in __hxcpp_collect (inMajor=true) at C:/HaxeToolkit/haxe/lib/hxcpp/git/src/hx/gc/GcCommon.cpp:87
#9  0x00000000074d5b56 in cases::TestMisc_obj::__construct()::_hx_Closure_12::_hx_run()::_hx_Closure_11::_hx_run()::_hx_Closure_9::_hx_run() (this=0x7f08a3a0432c) at ./src/cases/TestMisc.cpp:228
#10 0x00000000074d58d3 in cases::TestMisc_obj::__construct()::_hx_Closure_12::_hx_run()::_hx_Closure_11::_hx_run()::_hx_Closure_9::__run() (this=0x7f08a3a0432c) at ./src/cases/TestMisc.cpp:231
#11 0x0000000006ae8576 in Dynamic::operator() (this=0x7f08b95aec68) at C:/HaxeToolkit/haxe/lib/hxcpp/git/include/Dynamic.h:300
#12 0x0000000006e5560f in uhx::expose::HxcppRuntime_Haxe::callFunction0 (ptr=139675081655084) at ./src/uhx/expose/HxcppRuntime_Haxe.cpp:404
#13 0x000000000330dc1b in uhx::expose::HxcppRuntime::callFunction0 (ptr=139675081655084) at C:/dev/projs/UnrealHxTests/Intermediate/Haxe/HaxeUnitTests-Linux-Development-Game/Generated/Shared\uhx/expose/HxcppRuntime.h:172
#14 uhx::expose::HxcppRuntime::callFunction (ptr=139675081655084) at C:/dev/projs/UnrealHxTests/Intermediate/Haxe/HaxeUnitTests-Linux-Development-Game/Generated/Shared\uhx/expose/HxcppRuntime.h:21
#15 uhx::LambdaBinderVoid<>::operator()() const (this=<optimized out>) at C:/dev/projs/UnrealHxTests/Intermediate/Haxe/HaxeUnitTests-Linux-Development-Game/Template/Public\uhx/LambdaBinding.h:40
#16 UE4Tuple_Private::TTupleImpl<TIntegerSequence<unsigned int>>::ApplyAfter<uhx::LambdaBinderVoid<>&>(uhx::LambdaBinderVoid<>&) const (this=<optimized out>, Func=...)
    at C:/Program Files/Epic Games/UE_4.22/Engine/Source/Runtime/Core/Public/Templates/Tuple.h:415
#17 TBaseFunctorDelegateInstance<TTypeWrapper<void> (), uhx::LambdaBinderVoid<>>::Execute() const (this=<optimized out>) at C:/Program Files\Epic Games\UE_4.22\Engine\Source\Runtime\Core\Public\Delegates/DelegateInstancesImpl.h:893
#18 0x0000000000000000 in ?? ()

At this point, this thread that is marking has gone already through the point where ClearRowMarks is called - this means that the thread that is running reclaim thinks that all blocks are unmarked, so it starts reclaiming them. This causes all sorts of different crashes at runtime.

At first I believed that this would be related to the way Unreal.hx registers an external thread stack (with gc_set_top_of_stack(top_of_stack,false) / gc_set_top_of_stack(0,false) ), but I couldn't figure out from the Immix code what would be the normal safeguards that don't allow ReclaimAsync to run concurrently with a mark phase, since there doesn't seem to be any locks preventing that from happening.

Let me know if you'd like me to provide you with a build that shows this issue happening. I recorded a rr run of the failure, so I can replay this exact issue multiple times. Let me know if you need more information about it!

hughsando commented 5 years ago

Do you have this change? https://github.com/HaxeFoundation/hxcpp/commit/bfd7420cb2fd3f05750e0409f880ecf8e42ff299 (should be in 4.0.19)

This was made shortly before your report, but seems a possible cause/solution. If your problem persists after this, I can have a better look.