godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.22k stars 21.22k forks source link

Deferred call of GDScript lambda from thread leads to double free or corruption when hot reloading #84046

Closed j-zeppenfeld closed 1 year ago

j-zeppenfeld commented 1 year ago

Godot version

v4.2.beta3.official [e8d57afae]

System information

Godot v4.2.beta3 - Windows 10.0.19045 - Vulkan (Mobile) - dedicated NVIDIA RTX A2000 Laptop GPU (NVIDIA; 31.0.15.2737) - 11th Gen Intel(R) Core(TM) i7-11850H @ 2.50GHz (16 Threads)

Issue description

In Godot 4.1 I regularly use call_deferred to send data from a WorkerThreadPool thread back to the main thread. Usually this involves a direct function call, but occasionally I emit a deferred signal by wrapping the emit call into a lambda:

(func(): test_signal.emit()).call_deferred()

In Godot 4.2, this crashes the game.

Simply emitting the signal (without deferring it) does not immediately crash the game, but does not call the connected signals either.

Steps to reproduce

Simply create a node with the following script and attempt to run it.

extends Node2D

signal test_signal

func _ready():
    test_signal.connect(test_done)
    WorkerThreadPool.add_task(T_test_func)

func T_test_func():
    print("Test processing called.")
    #test_done.call_deferred() # This works.
    #test_signal.emit() # This doesn't crash but test_done is not called.
    (func(): test_signal.emit()).call_deferred() # This crashes the game.

func test_done():
    print("Test done.")

Minimal reproduction project

Reproduction is trivial without a dedicated project.

akien-mga commented 1 year ago

I can confirm the bug. It doesn't happen all the time, but when it does it reports double free or corruption (fasttop) on Linux.

For me, the crash only happen when running the project from the editor, i.e. with the debugging server enabled.

That makes it a bit difficult to debug with gdb, but I managed to get a stacktrace by enabling "Keep Debug Server Open" in the editor, and then running the project with gdb on the command line using --remote-debug tcp://127.0.0.1:6007.

double free or corruption (fasttop)

Thread 1 "godot-git" received signal SIGABRT, Aborted.
0x00007ffff7d683cc in __pthread_kill_implementation () from /lib64/libc.so.6
(gdb) bt
#0  0x00007ffff7d683cc in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007ffff7d198b2 in raise () from /lib64/libc.so.6
#2  0x00007ffff7d05464 in abort () from /lib64/libc.so.6
#3  0x00007ffff7d5c818 in __libc_message () from /lib64/libc.so.6
#4  0x00007ffff7d7126a in malloc_printerr () from /lib64/libc.so.6
#5  0x00007ffff7d72d7a in _int_free () from /lib64/libc.so.6
#6  0x00007ffff7d7535b in free () from /lib64/libc.so.6
#7  0x000000000967581f in Memory::free_static (p_ptr=0x7fffa000adf0, p_pad_align=false) at ./core/os/memory.cpp:168
#8  0x000000000548d358 in DefaultAllocator::free (p_ptr=0x7fffa000adf0) at ./core/os/memory.h:66
#9  0x0000000005e67b41 in memdelete_allocator<List<GDScriptFunction**, DefaultAllocator>::Element, DefaultAllocator> (p_class=0x7fffa000adf0) at ./core/os/memory.h:124
#10 0x0000000005e46127 in List<GDScriptFunction**, DefaultAllocator>::_Data::erase (this=0x7fffa000adc0, p_I=0x7fffa000adf0) at ./core/templates/list.h:241
#11 0x0000000005deafb7 in List<GDScriptFunction**, DefaultAllocator>::Element::erase (this=0x7fffa000adf0) at ./core/templates/list.h:132
#12 0x0000000005d07b65 in GDScript::_remove_func_ptr_to_update (p_func_ptr_element=...) at ./modules/gdscript/gdscript.cpp:1416
#13 0x0000000005d78807 in GDScriptLambdaSelfCallable::~GDScriptLambdaSelfCallable (this=0x7fffa000ad50, __in_chrg=<optimized out>) at ./modules/gdscript/gdscript_lambda_callable.cpp:299
#14 0x0000000009a333b3 in memdelete<CallableCustom> (p_class=0x7fffa000ad50) at ./core/os/memory.h:109
#15 0x00000000098a1a66 in Callable::~Callable (this=0xd156478, __in_chrg=<optimized out>) at ./core/variant/callable.cpp:382
#16 0x0000000009bce650 in CallQueue::Message::~Message (this=0xd156478, __in_chrg=<optimized out>) at ./core/object/message_queue.h:84
#17 0x0000000009bab4e7 in CallQueue::flush (this=0xa8e31b0) at ./core/object/message_queue.cpp:347
#18 0x0000000007b652a9 in SceneTree::physics_process (this=0xd168590, p_time=0.016666666666666666) at ./scene/main/scene_tree.cpp:471
#19 0x00000000055198e1 in Main::iteration () at main/main.cpp:3575
#20 0x0000000005493e0e in OS_LinuxBSD::run (this=0x7fffffffcec0) at platform/linuxbsd/os_linuxbsd.cpp:933
#21 0x000000000548c880 in main (argc=10, argv=0x7fffffffd4a8) at platform/linuxbsd/godot_linuxbsd.cpp:74

CC @godotengine/gdscript

akien-mga commented 1 year ago

Given that's lambda related, I had a suspicion that #81628 may have caused this.

I tried a local revert and that solves the crash, so it's indeed a regression from #81628.

CC @rune-scape

akien-mga commented 1 year ago

This isn't related to emitting signals though, and only to using call_deferred on lambdas.

This crashes the same way:

extends Node

func _ready():
    WorkerThreadPool.add_task(T_test_func)

func T_test_func():
    (func(): print("hi")).call_deferred() # This crashes the game.

Same with:

extends Node

func _ready():
    WorkerThreadPool.add_task(T_test_func)

func T_test_func():
    var lambda = func(): print("hi")
    lambda.call_deferred() # This crashes the game.
j-zeppenfeld commented 1 year ago

This isn't related to emitting signals though, and only to using call_deferred on lambdas.

Does emiting a signal involve lamdas in any other way? Otherwise it's possible that there are two issues here. The first with calling a deferred lambda, the second with emiting a signal from a thread.

As far as I see with the given example, connected functions are not called when a signal is emitted from a thread. I would have assumed they are called in the context of the thread.

akien-mga commented 1 year ago

That's a separate issue from the crash, and not a regression in 4.2, as it behaves the same in 4.1.

Your line:

    test_signal.emit() # This doesn't crash but test_done is not called.

raises an error in 4.1 and 4.2 which explains that you can't emit signals from threads:

ERROR: Caller thread can't call this function in this node (/root/Node2D). Use call_deferred() or call_thread_group() instead.
   at: emit_signalp (scene/main/node.cpp:3570)

AFAIK it's not thread safe, and as part of the work done in 4.1 on scene tree multithreading, it had to be restricted. But it's indeed not related to the crash I described above which comes from deferring a lambda call in a thread.

j-zeppenfeld commented 1 year ago

Thanks for the clarification, I must have missed that error message.

I actually prefer the limitation of needing to defer emiting signals from a thread. That way the callee knows that it will always be running in the main thread.

Is there a type-safe way to defer emiting a signal from a thread without using a lambda? Lacking emit_deferred I thought the lambda method was the canonical way of doing so, but given that the unit tests didn't catch this indicates to me that it is not?

akien-mga commented 1 year ago

In theory the type safe way to do this should be some_signal.emit.call_deferred(), but there's currently a limitation which prevents this from working (see #82264 and linked issue).

So I don't think there's a type safe / string-less way to emit this currently, you'd have to do emit_signal.bind("test_signal").call_deferred(), or call_deferred("emit_signal", "test_signal").

Or indeed the lambda solution makes sense, if it didn't crash.

adamscott commented 1 year ago

Weird. I cannot reproduce this issue on the latest master build (v4.2.beta.custom_build [3e7f638d7]).

I'm I doing something wrong?

Capture vidéo du 2023-11-07 15-20-37.webm

adamscott commented 1 year ago

Just discussed with @allenwp, he says that the issue seems to be intermittent with beta 5. Sometimes, it works, other times, it crashes or it errors out (trying to write into invalid memory). We are investigating.

He is on Windows 10.

allenwp commented 1 year ago

Here's an example of the crash with v4.2.beta5.official [4c96e9676] Godot v4.2.beta5 - Windows 10.0.19045 - Vulkan (Mobile) - dedicated NVIDIA GeForce GTX 980 Ti (NVIDIA; 31.0.15.3699) - Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz (8 Threads)

image

And again: image

Sometimes I get no error dialog, but the app just closes: image

And other times, it works fine without any issue: image

To get these different results, I just hit F5 to start the game, then try again and again.

Edit:

Here's a call stack:

Exception thrown at 0x00007FF6214581E4 in godot.windows.editor.dev.x86_64.exe: 0xC0000005: Access violation reading location 0x0000000000000000.

>   godot.windows.editor.dev.x86_64.exe!mtx_do_lock(_Mtx_internal_imp_t * mtx, const xtime * target) Line 103   C++
    godot.windows.editor.dev.x86_64.exe!std::_Mutex_base::lock() Line 50    C++
    godot.windows.editor.dev.x86_64.exe!std::unique_lock<std::recursive_mutex>::unique_lock<std::recursive_mutex>(std::recursive_mutex & _Mtx) Line 134 C++
    godot.windows.editor.dev.x86_64.exe!MutexLock<MutexImpl<std::recursive_mutex>>::MutexLock<MutexImpl<std::recursive_mutex>>(const MutexImpl<std::recursive_mutex> & p_mutex) Line 122    C++
    godot.windows.editor.dev.x86_64.exe!GDScript::_remove_func_ptr_to_update(const GDScript::UpdatableFuncPtrElement p_func_ptr_element) Line 1416  C++
    godot.windows.editor.dev.x86_64.exe!GDScriptLambdaSelfCallable::~GDScriptLambdaSelfCallable() Line 300  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!memdelete<CallableCustom>(CallableCustom * p_class) Line 112    C++
    godot.windows.editor.dev.x86_64.exe!Callable::~Callable() Line 385  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!CallQueue::flush() Line 349 C++
    godot.windows.editor.dev.x86_64.exe!SceneTree::physics_process(double p_time) Line 473  C++
    godot.windows.editor.dev.x86_64.exe!Main::iteration() Line 3576 C++
    godot.windows.editor.dev.x86_64.exe!OS_Windows::run() Line 1474 C++
    godot.windows.editor.dev.x86_64.exe!widechar_main(int argc, wchar_t * * argv) Line 182  C++
    godot.windows.editor.dev.x86_64.exe!_main() Line 204    C++
    godot.windows.editor.dev.x86_64.exe!main(int argc, char * * argv) Line 218  C++
    godot.windows.editor.dev.x86_64.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 232  C++
    [External Code] 

And some Autos/Locals from the VS debugger:

        GetCurrentThreadId  0x00007ff61c560557 {godot.windows.editor.dev.x86_64.exe!GetCurrentThreadId} void *
+       mtx 0x000001939dc85478 {type=0 cs={_Val=0.0000000000000000 _Pad=0x000001939dc85480 "" } thread_id=0 ...}    _Mtx_internal_imp_t *
        mtx->thread_id  0   long
+       target  0x0000000000000000 <NULL>   const xtime *

Edit 2: Where the VS debugger catches the crash varies as well. Here's another captured crash: Exception thrown: write access violation. p_I->**prev_ptr** was 0x2620000005A.

>   godot.windows.editor.dev.x86_64.exe!List<GDScriptFunction * *,DefaultAllocator>::_Data::erase(const List<GDScriptFunction * *,DefaultAllocator>::Element * p_I) Line 234    C++
    godot.windows.editor.dev.x86_64.exe!List<GDScriptFunction * *,DefaultAllocator>::Element::erase() Line 133  C++
    godot.windows.editor.dev.x86_64.exe!GDScript::_remove_func_ptr_to_update(const GDScript::UpdatableFuncPtrElement p_func_ptr_element) Line 1417  C++
    godot.windows.editor.dev.x86_64.exe!GDScriptLambdaSelfCallable::~GDScriptLambdaSelfCallable() Line 300  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!memdelete<CallableCustom>(CallableCustom * p_class) Line 112    C++
    godot.windows.editor.dev.x86_64.exe!Callable::~Callable() Line 385  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!CallQueue::flush() Line 349 C++
    godot.windows.editor.dev.x86_64.exe!SceneTree::physics_process(double p_time) Line 473  C++
    godot.windows.editor.dev.x86_64.exe!Main::iteration() Line 3576 C++
    godot.windows.editor.dev.x86_64.exe!OS_Windows::run() Line 1474 C++
    godot.windows.editor.dev.x86_64.exe!widechar_main(int argc, wchar_t * * argv) Line 182  C++
    godot.windows.editor.dev.x86_64.exe!_main() Line 204    C++
    godot.windows.editor.dev.x86_64.exe!main(int argc, char * * argv) Line 218  C++
    godot.windows.editor.dev.x86_64.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 232  C++
    [External Code] 
+       p_I 0x00000191f4a93370 {value=0x00000191f4a933b0 {0x00000191f4a93630 {name=0x00000191f4a93630 "05©ô‘\x1" ...}} ...} const List<GDScriptFunction * *,DefaultAllocator>::Element *
+       p_I->next_ptr   0x00000191f4a94070 {value=0x00000191f4a93370 {0x00000191f4a933b0 {name=0x00000191f4a933b0 "06©ô‘\x1" ...}} ...} List<GDScriptFunction * *,DefaultAllocator>::Element * const
+       p_I->prev_ptr   0x000002620000005a {value=??? next_ptr=??? prev_ptr=??? ...}    List<GDScriptFunction * *,DefaultAllocator>::Element * const
+       this    0x00000191f4c98390 {first=0x0000004900000052 {value=??? next_ptr=??? prev_ptr=??? ...} last=0x0000003a00000044 {...} ...}   List<GDScriptFunction * *,DefaultAllocator>::_Data *

Edit 3: And another. Three call stacks is enough to "triangulate" the issue, right? ;)

This one was captured by running the code from a Button _pressed() call instead. godot.windows.editor.dev.x86_64.exe has triggered a breakpoint.

    ntdll.dll!00007ff8c51af202()    Unknown
    ntdll.dll!00007ff8c51b7fc2()    Unknown
    ntdll.dll!00007ff8c51b82aa()    Unknown
    ntdll.dll!00007ff8c51bdf31()    Unknown
    ntdll.dll!00007ff8c50d5bf0()    Unknown
    ntdll.dll!00007ff8c50d47b1()    Unknown
    godot.windows.editor.dev.x86_64.exe!_free_base(void * block) Line 105   C++
>   godot.windows.editor.dev.x86_64.exe!Memory::free_static(void * p_ptr, bool p_pad_align) Line 169    C++
    godot.windows.editor.dev.x86_64.exe!DefaultAllocator::free(void * p_ptr) Line 66    C++
    godot.windows.editor.dev.x86_64.exe!memdelete_allocator<List<GDScriptFunction * *,DefaultAllocator>::Element,DefaultAllocator>(List<GDScriptFunction * *,DefaultAllocator>::Element * p_class) Line 125 C++
    godot.windows.editor.dev.x86_64.exe!List<GDScriptFunction * *,DefaultAllocator>::_Data::erase(const List<GDScriptFunction * *,DefaultAllocator>::Element * p_I) Line 242    C++
    godot.windows.editor.dev.x86_64.exe!List<GDScriptFunction * *,DefaultAllocator>::Element::erase() Line 133  C++
    godot.windows.editor.dev.x86_64.exe!GDScript::_remove_func_ptr_to_update(const GDScript::UpdatableFuncPtrElement p_func_ptr_element) Line 1417  C++
    godot.windows.editor.dev.x86_64.exe!GDScriptLambdaSelfCallable::~GDScriptLambdaSelfCallable() Line 300  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!memdelete<CallableCustom>(CallableCustom * p_class) Line 112    C++
    godot.windows.editor.dev.x86_64.exe!Callable::~Callable() Line 385  C++
    [External Code] 
    godot.windows.editor.dev.x86_64.exe!CallQueue::flush() Line 349 C++
    godot.windows.editor.dev.x86_64.exe!SceneTree::physics_process(double p_time) Line 473  C++
    godot.windows.editor.dev.x86_64.exe!Main::iteration() Line 3576 C++
    godot.windows.editor.dev.x86_64.exe!OS_Windows::run() Line 1474 C++
    godot.windows.editor.dev.x86_64.exe!widechar_main(int argc, wchar_t * * argv) Line 182  C++
    godot.windows.editor.dev.x86_64.exe!_main() Line 204    C++
    godot.windows.editor.dev.x86_64.exe!main(int argc, char * * argv) Line 218  C++
    godot.windows.editor.dev.x86_64.exe!WinMain(HINSTANCE__ * hInstance, HINSTANCE__ * hPrevInstance, char * lpCmdLine, int nCmdShow) Line 232  C++
    [External Code] 
+       mem 0x000001d054a95f30 " "  unsigned char *
allenwp commented 1 year ago

Because the error is intermittent, I've been using this code to test fixes for this issue, which allows for a few thousand tests to be quickly done:

extends Node

signal test_signal

static var state: int = 0

func _process(_delta: float) -> void:
    if state == 0:
        test_signal.connect(test_done)
        WorkerThreadPool.add_task(T_test_func)
    elif state == 1:
        test_signal.disconnect(test_done)
        state = 0

func T_test_func():
    print("Test processing called.")
    #test_done.call_deferred() # This works.
    #test_signal.emit() # This doesn't crash but test_done is not called.
    (func(): test_signal.emit()).call_deferred() # This crashes the game.

func test_done():
    print("Test done.")
    state = 1