Open nicom7 opened 1 week ago
My guess is that this cache wasn't designed to be thread-safe at the beginning and so this causes problems when it is marked dirty and needs to be updated.
Indeed, The crash looks like a thread issue and I opended a pr to illustrate, if I use a mutex to protect the fucntion, it would not crash.
I'd say this is an edge case in node group processing. The engine should prevent or delay node additions to the processing groups while a thread group is being processed (or maybe changing how timers interact with the feature, whatever). Adding a mutex, while seemingly fixing the issue, would be against the philosophy of threaded node processing. The feature is designed to guard against cross-thread interactions instead of trying to make the scene tree thread-safe.
Instead of a mutex, could it be an option to only update the node children cache on the main thread by adding a guard?
_FORCE_INLINE_ void _update_children_cache() const {
if (unlikely(data.children_cache_dirty) && Thread::is_main_thread()) {
_update_children_cache_impl();
}
}
One problem I see here is that the cache could stay dirty for a while and impact performance or maybe only defer the update for later on the main thread.
could it be an option to only update the node children cache on the main thread by adding a guard?
Tested locally, it works fine. I edited my pr to show my change https://github.com/godotengine/godot/pull/93963
Tested versions
System information
Godot v4.3.beta1.mono - Windows 10.0.19045 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1070 (NVIDIA; 31.0.15.3623) - Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (8 Threads)
Issue description
Setting a node
process_thread_group
property toPROCESS_THREAD_GROUP_SUB_THREAD
can lead to game crashing and errors related to array sorting:While investigating this issue I found that the crash occurs in
Node::_update_children_cache_impl()
, which is actually called from a sub thread at this point. My guess is that this cache wasn't designed to be thread-safe at the beginning and so this causes problems when it is marked dirty and needs to be updated. Here's the full call stack:In the MRP I setup a
main
scene that instantiates thenode_1
scene once per second. Thenode_1
scene then instantiates 10 copies ofnode_2
which has its process thread group set to sub thread. Each node of thenode_2
scene needs a script attached with an override of_process()
otherwise the bug doesn't occur.Steps to reproduce
Minimal reproduction project (MRP)
bug_thread_groups.zip