godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
86.48k stars 19.27k forks source link

Usage of freed memory by GdNavigationServer when creating and removing NavigationAgent2D #46285

Closed qarmin closed 2 years ago

qarmin commented 3 years ago

Godot version: 4.0.dev.custom_build. 3bb628d8f

OS Ubuntu 20.04 - Ubuntu 3.36 X11

Issue description: Running project which contains

func _ready():
    for i in range(1000):
        var rr = NavigationAgent2D.new()
        add_child(rr)
        rr.queue_free()

shows in address sanitizer this invalid memory usage

==280654==ERROR: AddressSanitizer: heap-use-after-free on address 0x616000038a90 at pc 0x000005c7a714 bp 0x7ffeeafdffe0 sp 0x7ffeeafdffd0
READ of size 8 at 0x616000038a90 thread T0
    #0 0x5c7a713 in GdNavigationServer::_cmd_agent_set_callback(RID, Object*, StringName, Variant) modules/gdnavigation/gd_navigation_server.cpp:403
    #1 0x5c8c8ec in agent_set_callback_command::exec(GdNavigationServer*) modules/gdnavigation/gd_navigation_server.cpp:399
    #2 0x5c7c640 in GdNavigationServer::flush_queries() modules/gdnavigation/gd_navigation_server.cpp:477
    #3 0x5c7cccf in GdNavigationServer::process(float) modules/gdnavigation/gd_navigation_server.cpp:484
    #4 0x1fc3c05 in Main::iteration() main/main.cpp:2450
    #5 0x1e6c83e in OS_LinuxBSD::run() platform/linuxbsd/os_linuxbsd.cpp:261
    #6 0x1e5f040 in main platform/linuxbsd/godot_linuxbsd.cpp:58
    #7 0x7f3932a590b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
    #8 0x1e5eb7d in _start (/usr/bin/godot4s+0x1e5eb7d)

0x616000038a90 is located 16 bytes inside of 624-byte region [0x616000038a80,0x616000038cf0)
freed by thread T0 here:
    #0 0x7f393393e1b7 in __interceptor_free (/lib/x86_64-linux-gnu/libasan.so.6+0xb01b7)
    #1 0x1287024c in Memory::free_static(void*, bool) core/os/memory.cpp:169
    #2 0x1fcf987 in void memdelete<Object>(Object*) core/os/memory.h:118
    #3 0xc74f9ed in SceneTree::_flush_delete_queue() scene/main/scene_tree.cpp:982
    #4 0xc73c240 in SceneTree::physics_process(float) scene/main/scene_tree.cpp:417
    #5 0x1fc3a2d in Main::iteration() main/main.cpp:2445
    #6 0x1e6c83e in OS_LinuxBSD::run() platform/linuxbsd/os_linuxbsd.cpp:261
    #7 0x1e5f040 in main platform/linuxbsd/godot_linuxbsd.cpp:58
    #8 0x7f3932a590b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

previously allocated by thread T0 here:
    #0 0x7f393393e517 in malloc (/lib/x86_64-linux-gnu/libasan.so.6+0xb0517)
    #1 0x1286f20d in Memory::alloc_static(unsigned long, bool) core/os/memory.cpp:76
    #2 0x1286f11e in operator new(unsigned long, char const*) core/os/memory.cpp:41
    #3 0xc4c1f5e in Object* ClassDB::creator<NavigationAgent2D>() core/object/class_db.h:146
    #4 0x1393952e in ClassDB::instance(StringName const&) core/object/class_db.cpp:524
    #5 0x55f55f2 in GDScriptNativeClass::instance() modules/gdscript/gdscript.cpp:83
    #6 0x55f4fda in GDScriptNativeClass::_new() modules/gdscript/gdscript.cpp:71
    #7 0x56cd8cd in void call_with_variant_args_ret_helper<__UnexistingClass, Variant>(__UnexistingClass*, Variant (__UnexistingClass::*)(), Variant const**, Variant&, Callable::CallError&, IndexSequence<>) core/variant/binder_common.h:562
    #8 0x56cd329 in void call_with_variant_args_ret_dv<__UnexistingClass, Variant>(__UnexistingClass*, Variant (__UnexistingClass::*)(), Variant const**, int, Variant&, Callable::CallError&, Vector<Variant> const&) core/variant/binder_common.h:397
    #9 0x56cca21 in MethodBindTR<Variant>::call(Object*, Variant const**, int, Callable::CallError&) core/object/method_bind.h:451
    #10 0x139a424d in Object::call(StringName const&, Variant const**, int, Callable::CallError&) core/object/object.cpp:784
    #11 0x131fdfa5 in Variant::call(StringName const&, Variant const**, int, Variant&, Callable::CallError&) core/variant/variant_call.cpp:611
    #12 0x5b1621a in GDScriptFunction::call(GDScriptInstance*, Variant const**, int, Callable::CallError&, GDScriptFunction::CallState*) modules/gdscript/gdscript_vm.cpp:1372
    #13 0x562a0c3 in GDScriptInstance::call(StringName const&, Variant const**, int, Callable::CallError&) modules/gdscript/gdscript.cpp:1544
    #14 0x13a0fa23 in ScriptInstance::call(StringName const&, Variant const&, Variant const&, Variant const&, Variant const&, Variant const&) core/object/script_language.cpp:322
    #15 0xc64c267 in Node::_notification(int) scene/main/node.cpp:147
    #16 0x336c589 in Node::_notificationv(int, bool) scene/main/node.h:45
    #17 0x336ee58 in CanvasItem::_notificationv(int, bool) scene/main/canvas_item.h:164
    #18 0xe3425ba in Node2D::_notificationv(int, bool) scene/2d/node_2d.h:37
    #19 0x139a46e2 in Object::notification(int, bool) core/object/object.cpp:793
    #20 0xc64e4c9 in Node::_propagate_ready() scene/main/node.cpp:188
    #21 0xc64dd2f in Node::_propagate_ready() scene/main/node.cpp:180
    #22 0xc69f27e in Node::_set_tree(SceneTree*) scene/main/node.cpp:2613
    #23 0xc73a136 in SceneTree::initialize() scene/main/scene_tree.cpp:392
    #24 0x1e6c3b4 in OS_LinuxBSD::run() platform/linuxbsd/os_linuxbsd.cpp:249
    #25 0x1e5f040 in main platform/linuxbsd/godot_linuxbsd.cpp:58
    #26 0x7f3932a590b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)
smix8 commented 3 years ago

@qarmin Crash happens with both NavigationAgent2D and NavigationAgent3D as they share the same codebase on this.

https://github.com/godotengine/godot/blob/f817e7fb0ae711e0e92078fb0d09c90abd5a67c8/modules/gdnavigation/gd_navigation_server.cpp#L442

This line is registering the navigationagents placed in the NavigationServer queue for collision avoidance callbacks and causing the crash when p_receiver is deleted (which is the NavigationAgent Node) and get_instance_id() fails.

I tried all kind of p_receiver validation checks now but they still fail sometimes. If a splitsecond is waited before calling free() over 100000+ agents can be deleted without any issues (just the game freezing for a few seconds).

From my understanding the queue_free() queue and NavigationServer queue are worked through in parallel on different threads which makes it hard to react to unexpected deletion but maybe @AndreaCatania has more insight on the NavigationServer and how to prevent this crash.

At least outside of such tests I think normal users would never have a valid reason to add agents and delete them immediately on the same process frame to encounter this.

czhu59 commented 2 years ago

@smix8 Hi! My team member and I are interested in helping out! This is our first time contributing to an open-source project, and we were interested in figuring out what might be causing this issue. We saw that you already have a hypothesis of what might be the issue, and we were wondering how you reproduced this issue and what command you ran through terminal. Thanks!

smix8 commented 2 years ago

@czhu59 I used the small GDScript code snippet qarmin posted. For testing Instead of using queue_free() directly in same loop I waited an idle frame followed by a new loop with queue_free() for all added agents. The rest of information came from the editor error msgs and recent knowledge (a year ago) that navigationserver had multiple other issues with callbacks and threading.

smix8 commented 2 years ago

@akien-mga @Calinou It seems one of the recent fixes to agent callbacks also fixed this issue. I can no longer reproduce this crash.

akien-mga commented 2 years ago

So what should we do with #60311? It's apparently not needed anymore?

smix8 commented 2 years ago

I think #62025 was responsible for the silent fix here. Turned out the crash was not due to threading but because the callback dispatch failed because the callback receiver object were deleted immediately. So yeah I think the pr can be closed as something else turned out to be responsible for the bug.

I also think there should be no reason to lock inside the loop because at that point in the iteration everything from scripts is already finished that could intervene and only the NavigationServer runs followed by the physics frame finish.

akien-mga commented 2 years ago

Fixed by #62025.