Open TokisanGames opened 1 month ago
I've done further research to narrow the cause for the conditions down. OP is updated.
Undoubtedly there is a bug in the Resource::GDCLASS macro and ObjectDB::cleanup() since it doesn't check for a null pointer before calling it on Object::_extension.
However, I suspect this is an unexpected code path. It's probably expected that gdextension resources have already been freed by that point and any considered objects should have _extension== nullptr. But a second bug has created a condition that triggers this path.
I wonder if the second bug is in GDExtension (godot-cpp 4.2-cherrypicks-7), since there have been problems with Dictionaries and TypedArrays in the past and what we're doing is likely a less common operation.
cc: @dsnopek
Yeah, I'm not sure there is a general fix we can make here. Maybe we could add a check to see if the engine is already at a certain point of shutting down before trying to use the pointer from _get_extension()
?
But we should definitely figure out why there is still an existing resource way after its GDExtension has been cleaned up. It sounds like your saying that in your investigation there's a Dictionary
which contains an Array
, and when it has no references anymore and should get cleaned up, it just doesn't?
This sounds a lot like issue https://github.com/godotengine/godot-cpp/issues/1240 which we theoretically fixed in PR https://github.com/godotengine/godot-cpp/pull/1379 - but that's been merged for some time. Can you see if you have that fix in your version of godot-cpp? I see it in the 4.3
branch, but not and 4.2
, which it sounds like you're using?4.2
branch.
EDIT: I searched the 4.2
branch badly :-) I see that PR merged in there too, so it probably isn't due to a lack of that fix
Tested versions
4.2.2, 4.3-stable, 4.3 6699ae7897658e44efc3cfb2cba91c11a8f5aa6a Master untested but the problem is visible in the code.
godot-cpp 4.2-cherrypicks-7
System information
Windows 11/64, RTX 3070, Vulkan
Issue description
I believe there are two bugs here:
Somewhere in gdextension
Dictionary[String] -> TypedArray<GDextensionResource>
it isn't properly freeing the GDextResource when it is reassigned to a new dictionary pointer withdict = Dictionary();
That's triggering the engine to go down an unexpected code path on cleanup triggering the second bug:In the engine, ObjectDB::cleanup() code and Resource::GDCLASS macro assume
_extension
is a null pointer. However under the above circumstances and maybe others this assumption is wrong and the engine attempts to callis_class("Node")
on a gdextension resource where the extension has already been freed. The function call on a null pointer causes a crash.The second problem is here in this sequence and call stack:
In the Resource macro, _get_extension() returns a non-null pointer to already freed memory. It was freed in unregister_core_types(). Since it's freed, it's very difficult to figure out what object was being queried, but with other debugging I've determined this situation is a Terrain3DResource. So, Godot freed the library, _extension is not-null and invalid, then the Resource macro attempts to call a function from the null pointer resulting in the crash.
Deleting gdextensions before everything is shutdown has caused more than one issue. Related https://github.com/godotengine/godot/issues/95310. The fix for this didn't solve the fundamental problem.
Here's more information about this. It may be a gdextension bug. Using godot-cpp 4.2-cherrypicks-7
object_slots[i].validator
is true. The instances haven't been cleared yet. Not all of my gdextension resources or even Terrain3DRegions fall under these conditions. I can load up regions from disk and not have an issue. The conditions are set only once we start editing data, which means region duplicates, EditorUndoRedoManager and so on.We've been using the first line to get a new pointer and make our backup work properly. This alone sets up the conditions for a crash. If I replace the first with the second line it no longer crashes.
We also have this other bit of code where
_undo_data
is used that can also be adjusted to prevent the crash:I can
return
right before the _undo_data line or comment it out and it won't crash. If I return right after or leave it uncommented, it will crash._original_regions
and_edited_regions
are bothTypedArray<Terrain3DRegion>
._undo_data
andredo_data
are both Dictionaries. They store essentially the same type of data. The only fundamental differences between them are the first one is a class member, and it gets reset by assignment._undo_data.clear()
in the first block. And use_undo_data.duplicate()
in the second block where we pass the data to the undo/redo manager.Steps to reproduce
On quit it should crash in Resource, on the GDClass macro.
Minimal reproduction project (MRP)
Artifact https://github.com/TokisanGames/Terrain3D/actions/runs/11317731831