Closed qarmin closed 2 years ago
@qarmin Crash happens with both NavigationAgent2D and NavigationAgent3D as they share the same codebase on this.
This line is registering the navigationagents placed in the NavigationServer queue for collision avoidance callbacks and causing the crash when p_receiver
is deleted (which is the NavigationAgent Node) and get_instance_id()
fails.
I tried all kind of p_receiver
validation checks now but they still fail sometimes. If a splitsecond is waited before calling free() over 100000+ agents can be deleted without any issues (just the game freezing for a few seconds).
From my understanding the queue_free() queue and NavigationServer queue are worked through in parallel on different threads which makes it hard to react to unexpected deletion but maybe @AndreaCatania has more insight on the NavigationServer and how to prevent this crash.
At least outside of such tests I think normal users would never have a valid reason to add agents and delete them immediately on the same process frame to encounter this.
@smix8 Hi! My team member and I are interested in helping out! This is our first time contributing to an open-source project, and we were interested in figuring out what might be causing this issue. We saw that you already have a hypothesis of what might be the issue, and we were wondering how you reproduced this issue and what command you ran through terminal. Thanks!
@czhu59 I used the small GDScript code snippet qarmin posted. For testing Instead of using queue_free() directly in same loop I waited an idle frame followed by a new loop with queue_free() for all added agents. The rest of information came from the editor error msgs and recent knowledge (a year ago) that navigationserver had multiple other issues with callbacks and threading.
@akien-mga @Calinou It seems one of the recent fixes to agent callbacks also fixed this issue. I can no longer reproduce this crash.
So what should we do with #60311? It's apparently not needed anymore?
I think #62025 was responsible for the silent fix here. Turned out the crash was not due to threading but because the callback dispatch failed because the callback receiver object were deleted immediately. So yeah I think the pr can be closed as something else turned out to be responsible for the bug.
I also think there should be no reason to lock inside the loop because at that point in the iteration everything from scripts is already finished that could intervene and only the NavigationServer runs followed by the physics frame finish.
Fixed by #62025.
Godot version: 4.0.dev.custom_build. 3bb628d8f
OS Ubuntu 20.04 - Ubuntu 3.36 X11
Issue description: Running project which contains
shows in address sanitizer this invalid memory usage