unguarded_linear_insert error and game crashing when using PROCESS_THREAD_GROUP_SUB_THREAD

nicom7 commented 1 week ago

Tested versions

Reproducible in: v4.3.beta2.mono.official [b75f0485b], v4.3.beta1.mono.official [a4f2ea91a], v4.1.3.stable.mono.official [f06b6836a]

System information

Godot v4.3.beta1.mono - Windows 10.0.19045 - Vulkan (Forward+) - dedicated NVIDIA GeForce GTX 1070 (NVIDIA; 31.0.15.3623) - Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (8 Threads)

Issue description

Setting a node process_thread_group property to PROCESS_THREAD_GROUP_SUB_THREAD can lead to game crashing and errors related to array sorting:

E 0:00:01:0742   SortArray<class Node *,struct Node::ComparatorWithPriority,1>::unguarded_linear_insert: bad comparison function; sorting will be broken
  <C++ Source>   C:\Users\nicom\Documents\Project Repos\godot\core/templates/sort_array.h:255 @ SortArray<class Node *,struct Node::ComparatorWithPriority,1>::unguarded_linear_insert()

While investigating this issue I found that the crash occurs in Node::_update_children_cache_impl(), which is actually called from a sub thread at this point. My guess is that this cache wasn't designed to be thread-safe at the beginning and so this causes problems when it is marked dirty and needs to be updated. Here's the full call stack:

>   godot.windows.editor.dev.x86_64.exe!Node::_update_children_cache_impl() Line 1637   C++
    godot.windows.editor.dev.x86_64.exe!Node::_update_children_cache() Line 297 C++
    godot.windows.editor.dev.x86_64.exe!Node::get_index(bool p_include_internal) Line 493   C++
    godot.windows.editor.dev.x86_64.exe!Node::is_greater_than(const Node * p_node) Line 1976    C++
    godot.windows.editor.dev.x86_64.exe!Node::ComparatorWithPriority::operator()(const Node * p_a, const Node * p_b) Line 154   C++
    godot.windows.editor.dev.x86_64.exe!SortArray<Node *,Node::ComparatorWithPriority,1>::linear_insert(__int64 p_first, __int64 p_last, Node * * p_array) Line 266 C++
    godot.windows.editor.dev.x86_64.exe!SortArray<Node *,Node::ComparatorWithPriority,1>::insertion_sort(__int64 p_first, __int64 p_last, Node * * p_array) Line 283    C++
    godot.windows.editor.dev.x86_64.exe!SortArray<Node *,Node::ComparatorWithPriority,1>::final_insertion_sort(__int64 p_first, __int64 p_last, Node * * p_array) Line 299  C++
    godot.windows.editor.dev.x86_64.exe!SortArray<Node *,Node::ComparatorWithPriority,1>::sort_range(__int64 p_first, __int64 p_last, Node * * p_array) Line 306    C++
    godot.windows.editor.dev.x86_64.exe!SortArray<Node *,Node::ComparatorWithPriority,1>::sort(Node * * p_array, __int64 p_len) Line 310    C++
    godot.windows.editor.dev.x86_64.exe!Vector<Node *>::sort_custom<Node::ComparatorWithPriority,1>() Line 121  C++
    godot.windows.editor.dev.x86_64.exe!SceneTree::_process_group(SceneTree::ProcessGroup * p_group, bool p_physics) Line 928   C++
    godot.windows.editor.dev.x86_64.exe!SceneTree::_process_groups_thread(unsigned int p_index, bool p_physics) Line 973    C++
    godot.windows.editor.dev.x86_64.exe!WorkerThreadPool::GroupUserData<SceneTree,void (__cdecl SceneTree::*)(unsigned int,bool),bool>::callback_indexed(unsigned int p_index) Line 188 C++
    godot.windows.editor.dev.x86_64.exe!WorkerThreadPool::_process_task(WorkerThreadPool::Task * p_task) Line 98    C++
    godot.windows.editor.dev.x86_64.exe!WorkerThreadPool::_thread_function(void * p_user) Line 201  C++
    godot.windows.editor.dev.x86_64.exe!Thread::callback(unsigned __int64 p_caller_id, const Thread::Settings & p_settings, void(*)(void *) p_callback, void * p_userdata) Line 66  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!thread_start<unsigned int (__cdecl*)(void *),1>(void * const parameter) Line 97 C++
    [External Code]

In the MRP I setup a main scene that instantiates the node_1 scene once per second. The node_1 scene then instantiates 10 copies of node_2 which has its process thread group set to sub thread. Each node of the node_2 scene needs a script attached with an override of _process() otherwise the bug doesn't occur.

Steps to reproduce

Open the MRP
Run the main scene => the game will crash after several instantiated children nodes

Minimal reproduction project (MRP)

bug_thread_groups.zip

jsjtxietian commented 1 week ago

My guess is that this cache wasn't designed to be thread-safe at the beginning and so this causes problems when it is marked dirty and needs to be updated.

Indeed, The crash looks like a thread issue and I opended a pr to illustrate, if I use a mutex to protect the fucntion, it would not crash.

RandomShaper commented 1 week ago

I'd say this is an edge case in node group processing. The engine should prevent or delay node additions to the processing groups while a thread group is being processed (or maybe changing how timers interact with the feature, whatever). Adding a mutex, while seemingly fixing the issue, would be against the philosophy of threaded node processing. The feature is designed to guard against cross-thread interactions instead of trying to make the scene tree thread-safe.

nicom7 commented 1 week ago

Instead of a mutex, could it be an option to only update the node children cache on the main thread by adding a guard?

    _FORCE_INLINE_ void _update_children_cache() const {
        if (unlikely(data.children_cache_dirty) && Thread::is_main_thread()) {
            _update_children_cache_impl();
        }
    }

One problem I see here is that the cache could stay dirty for a while and impact performance or maybe only defer the update for later on the main thread.

jsjtxietian commented 4 days ago

could it be an option to only update the node children cache on the main thread by adding a guard?

Tested locally, it works fine. I edited my pr to show my change https://github.com/godotengine/godot/pull/93963

godotengine / godot