Revolutionary-Games / Thrive

The main repository for the development of the evolution game Thrive.
https://revolutionarygamesstudio.com/
Other
2.84k stars 503 forks source link

Find and fix the root cause why ThreadLocal access to variable causes the game to lock up #4989

Open hhyyrylainen opened 7 months ago

hhyyrylainen commented 7 months ago

with a very non-useful traceback like:

^C
Thread 1 "godot.linuxbsd." received signal SIGINT, Interrupt.
0x00007f64d1b83969 in ?? ()
(gdb) bt
#0  0x00007f64d1b83969 in ?? ()
#1  0x0000000000000000 in ?? ()

This is like some dotnet 6 runtime issue and/or combination when used within godot. And this only happens if there's ever multiple threads accessing the thread local variable, just one thread accessing it doesn't trigger this bug.

hhyyrylainen commented 6 months ago

Trying to get a C# stacktrace gets a few interesting results:

1:  System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
2:  System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()
3:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
4:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:368

ManagedThreadId: 20, Name: TaskThread_15, OSThreadId: 1800814, Thread: IsAlive: True, State: 660008
1:  System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
2:  System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()
3:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
4:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:368
5:  Systems.MicrobeReproductionSystem.Update(Single, DefaultEcs.Entity ByRef) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:232

ManagedThreadId: 16, Name: TaskThread_11, OSThreadId: 1800810, Thread: IsAlive: True, State: 660008
1:  System.Diagnostics.Debugger.get_IsAttached()
2:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
3:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:368

So maybe the problem is purely when debugging or something? Maybe a bug with the debugger even?

Other threads are not very useful:

ManagedThreadId: 1, Name: TMain, OSThreadId: 1800711, Thread: IsAlive: True, State: 1180200
1:  TaskExecutor.Run(DefaultEcs.Threading.IParallelRunnable) at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:0
2:  DefaultEcs.System.AEntitySetSystem`1[[System.Single, System.Private.CoreLib]].Update(Single)
3:  MicrobeWorldSimulation.OnProcessFixedWithThreads(Single) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeWorldSimulation.generated.cs:107
4:  MicrobeWorldSimulation.OnProcessFixedLogic(Single) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeWorldSimulation.cs:336
5:  WorldSimulation.ProcessLogic(Single) at /home/hhyyrylainen/Projects/Thrive/src/general/base_stage/WorldSimulation.cs:206
6:  WorldSimulation.ProcessAll(Single) at /home/hhyyrylainen/Projects/Thrive/src/general/base_stage/WorldSimulation.cs:138
7:  MicrobeStage._Process(Double) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeStage.cs:210
8:  Godot.Node.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef)
9:  NodeWithInput.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef) at /home/hhyyrylainen/Projects/Thrive/Godot.SourceGenerators/Godot.SourceGenerators.ScriptMethodsGenerator/NodeWithInput_ScriptMethods.generated.cs:48
hhyyrylainen commented 6 months ago

Somehow it kind of looks like the bug is in the system implementation of thread local getting stuck in a "debugger present" check when multiple threads reach that at the exact same time.

hhyyrylainen commented 1 month ago

Still happens with the latest Thrive code and .NET 8:

Thread stacktraces:
ManagedThreadId: 1, Name: TMain, OSThreadId: 636949, Thread: IsAlive: True, State: 1180200
1:  TaskExecutor.Run(DefaultEcs.Threading.IParallelRunnable) at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:226
2:  DefaultEcs.System.AEntitySetSystem`1[[System.Single, System.Private.CoreLib]].Update(Single)
3:  MicrobeWorldSimulation.OnProcessFixedWithoutThreads(Single) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeWorldSimulation.generated.cs:238
4:  MicrobeWorldSimulation.OnProcessFixedLogic(Single) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeWorldSimulation.cs:356
5:  WorldSimulation.ProcessLogic(Single) at /home/hhyyrylainen/Projects/Thrive/src/general/base_stage/WorldSimulation.cs:206
6:  WorldSimulation.ProcessAll(Single) at /home/hhyyrylainen/Projects/Thrive/src/general/base_stage/WorldSimulation.cs:138
7:  MicrobeStage._Process(Double) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/MicrobeStage.cs:215
8:  Godot.Node.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef)
9:  NodeWithInput.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef) at /home/hhyyrylainen/Projects/Thrive/Godot.SourceGenerators/Godot.SourceGenerators.ScriptMethodsGenerator/NodeWithInput_ScriptMethods.generated.cs:48
10: StageBase.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef) at /home/hhyyrylainen/Projects/Thrive/Godot.SourceGenerators/Godot.SourceGenerators.ScriptMethodsGenerator/StageBase_ScriptMethods.generated.cs:208
11: CreatureStageBase`2[[DefaultEcs.Entity, DefaultEcs],[System.__Canon, System.Private.CoreLib]].InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef) at /home/hhyyrylainen/Projects/Thrive/Godot.SourceGenerators/Godot.SourceGenerators.ScriptMethodsGenerator/CreatureStageBase(Of TPlayer, TSimulation)_ScriptMethods.generated.cs:238
12: MicrobeStage.InvokeGodotClassMethod(Godot.NativeInterop.godot_string_name ByRef, Godot.NativeInterop.NativeVariantPtrArgs, Godot.NativeInterop.godot_variant ByRef) at /home/hhyyrylainen/Projects/Thrive/Godot.SourceGenerators/Godot.SourceGenerators.ScriptMethodsGenerator/MicrobeStage_ScriptMethods.generated.cs:368
13: Godot.Bridge.CSharpInstanceBridge.Call(IntPtr, Godot.NativeInterop.godot_string_name*, Godot.NativeInterop.godot_variant**, Int32, Godot.NativeInterop.godot_variant_call_error*, Godot.NativeInterop.godot_variant*)

ManagedThreadId: 2, Name: Unknown, OSThreadId: 637038, Thread: IsAlive: True, State: 135720
1:  Godot.Bridge.CSharpInstanceBridge.Call(IntPtr, Godot.NativeInterop.godot_string_name*, Godot.NativeInterop.godot_variant**, Int32, Godot.NativeInterop.godot_variant_call_error*, Godot.NativeInterop.godot_variant*)

ManagedThreadId: 4, Name: Unknown, OSThreadId: 637040, Thread: IsAlive: True, State: 135720
1:  Godot.Bridge.CSharpInstanceBridge.Call(IntPtr, Godot.NativeInterop.godot_string_name*, Godot.NativeInterop.godot_variant**, Int32, Godot.NativeInterop.godot_variant_call_error*, Godot.NativeInterop.godot_variant*)

ManagedThreadId: 8, Name: TaskThread_2, OSThreadId: 637073, Thread: IsAlive: True, State: 660008
1:  System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
2:  System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()
3:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
4:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:386
5:  Systems.MicrobeReproductionSystem.Update(Single, DefaultEcs.Entity ByRef) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:245
6:  DefaultEcs.System.AEntitySetSystem`1[[System.Single, System.Private.CoreLib]].Update(Single, System.ReadOnlySpan`1<DefaultEcs.Entity>)
7:  DefaultEcs.System.AEntitySetSystem`1+Runnable[[System.Single, System.Private.CoreLib]].Run(Int32, Int32)
8:  TaskExecutor.ProcessNormalCommand(ThreadCommand) at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:535
9:  TaskExecutor.RunExecutorThread() at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:469
10: System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11: [internal]

ManagedThreadId: 13, Name: TaskThread_7, OSThreadId: 637078, Thread: IsAlive: True, State: 660008
1:  System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
2:  System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()
3:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
4:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:386
5:  Systems.MicrobeReproductionSystem.Update(Single, DefaultEcs.Entity ByRef) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:245
6:  DefaultEcs.System.AEntitySetSystem`1[[System.Single, System.Private.CoreLib]].Update(Single, System.ReadOnlySpan`1<DefaultEcs.Entity>)
7:  DefaultEcs.System.AEntitySetSystem`1+Runnable[[System.Single, System.Private.CoreLib]].Run(Int32, Int32)
8:  TaskExecutor.ProcessNormalCommand(ThreadCommand) at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:535
9:  TaskExecutor.RunExecutorThread() at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:469
10: System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11: [internal]

ManagedThreadId: 15, Name: TaskThread_9, OSThreadId: 637080, Thread: IsAlive: True, State: 660008
1:  System.Diagnostics.Debugger.CustomNotification(System.Diagnostics.ICustomDebuggerNotification)
2:  System.Diagnostics.Debugger.NotifyOfCrossThreadDependencySlow()
3:  System.Threading.ThreadLocal`1[[System.__Canon, System.Private.CoreLib]].GetValueSlow()
4:  Systems.MicrobeReproductionSystem.HandleNormalMicrobeReproduction(DefaultEcs.Entity ByRef, Components.OrganelleContainer ByRef, Boolean) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:386
5:  Systems.MicrobeReproductionSystem.Update(Single, DefaultEcs.Entity ByRef) at /home/hhyyrylainen/Projects/Thrive/src/microbe_stage/systems/MicrobeReproductionSystem.cs:245
6:  DefaultEcs.System.AEntitySetSystem`1[[System.Single, System.Private.CoreLib]].Update(Single, System.ReadOnlySpan`1<DefaultEcs.Entity>)
7:  DefaultEcs.System.AEntitySetSystem`1+Runnable[[System.Single, System.Private.CoreLib]].Run(Int32, Int32)
8:  TaskExecutor.ProcessNormalCommand(ThreadCommand) at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:535
9:  TaskExecutor.RunExecutorThread() at /home/hhyyrylainen/Projects/Thrive/src/engine/TaskExecutor.cs:469
10: System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
11: [internal]

My analysis is still that it happens somewhere in the internal part of the runtime. So we'd probably be able to workaround this by using a simple cache but that would require 2 locks per object access from cache (so the current approach of just a single object that is locked for use might not be that far away in terms of performance).

hhyyrylainen commented 4 days ago

This basically I think has the same issue as: https://github.com/Revolutionary-Games/Thrive/issues/4996#issuecomment-2437246534