godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.32k stars 21.24k forks source link

Crash calling function on null pointer of already freed GDExtension #98182

Open TokisanGames opened 1 month ago

TokisanGames commented 1 month ago

Tested versions

4.2.2, 4.3-stable, 4.3 6699ae7897658e44efc3cfb2cba91c11a8f5aa6a Master untested but the problem is visible in the code.

godot-cpp 4.2-cherrypicks-7

System information

Windows 11/64, RTX 3070, Vulkan

Issue description

I believe there are two bugs here:

  1. Somewhere in gdextension Dictionary[String] -> TypedArray<GDextensionResource> it isn't properly freeing the GDextResource when it is reassigned to a new dictionary pointer with dict = Dictionary(); That's triggering the engine to go down an unexpected code path on cleanup triggering the second bug:

  2. In the engine, ObjectDB::cleanup() code and Resource::GDCLASS macro assume _extension is a null pointer. However under the above circumstances and maybe others this assumption is wrong and the engine attempts to call is_class("Node") on a gdextension resource where the extension has already been freed. The function call on a null pointer causes a crash.

The second problem is here in this sequence and call stack:

register_core_types.cpp::unregister_core_types() {
    ...
    memdelete(gdextension_manager);
    ...
    ObjectDB::cleanup(); // crash
}

ObjectDB::cleanup() {
    ...
    if (OS::get_singleton()->is_stdout_verbose()) {
        ...
        if (obj->is_class("Node")) // crash
}

Resource::GDClass macro extended inline
    virtual bool is_class(const String &p_class) const override {                                                                                
        if (_get_extension() && 
                         _get_extension()->is_class(p_class)) {      // crash
            return true;                                                                                                                         
        }                                                                                                                                        
        return (p_class == (#m_class)) ? true : m_inherits::is_class(p_class);
    }      

In the Resource macro, _get_extension() returns a non-null pointer to already freed memory. It was freed in unregister_core_types(). Since it's freed, it's very difficult to figure out what object was being queried, but with other debugging I've determined this situation is a Terrain3DResource. So, Godot freed the library, _extension is not-null and invalid, then the Resource macro attempts to call a function from the null pointer resulting in the crash.

Deleting gdextensions before everything is shutdown has caused more than one issue. Related https://github.com/godotengine/godot/issues/95310. The fix for this didn't solve the fundamental problem.

devenv_2IN0PfTyms

devenv_uiohYSv8rK


Here's more information about this. It may be a gdextension bug. Using godot-cpp 4.2-cherrypicks-7

We've been using the first line to get a new pointer and make our backup work properly. This alone sets up the conditions for a crash. If I replace the first with the second line it no longer crashes.

We also have this other bit of code where _undo_data is used that can also be adjusted to prevent the crash:

void Terrain3DEditor::_store_undo() {
    ...
    Dictionary redo_data;
    _undo_data["edited_regions"] = _original_regions; // ** causes latent crash **
    redo_data["edited_regions"] = _edited_regions;

I can return right before the _undo_data line or comment it out and it won't crash. If I return right after or leave it uncommented, it will crash.

_original_regions and _edited_regions are both TypedArray<Terrain3DRegion>.

_undo_data and redo_data are both Dictionaries. They store essentially the same type of data. The only fundamental differences between them are the first one is a class member, and it gets reset by assignment.

Steps to reproduce

On quit it should crash in Resource, on the GDClass macro.

Minimal reproduction project (MRP)

Artifact https://github.com/TokisanGames/Terrain3D/actions/runs/11317731831

TokisanGames commented 1 month ago

I've done further research to narrow the cause for the conditions down. OP is updated.

Undoubtedly there is a bug in the Resource::GDCLASS macro and ObjectDB::cleanup() since it doesn't check for a null pointer before calling it on Object::_extension.

However, I suspect this is an unexpected code path. It's probably expected that gdextension resources have already been freed by that point and any considered objects should have _extension== nullptr. But a second bug has created a condition that triggers this path.

I wonder if the second bug is in GDExtension (godot-cpp 4.2-cherrypicks-7), since there have been problems with Dictionaries and TypedArrays in the past and what we're doing is likely a less common operation.

cc: @dsnopek

dsnopek commented 2 weeks ago

Yeah, I'm not sure there is a general fix we can make here. Maybe we could add a check to see if the engine is already at a certain point of shutting down before trying to use the pointer from _get_extension()?

But we should definitely figure out why there is still an existing resource way after its GDExtension has been cleaned up. It sounds like your saying that in your investigation there's a Dictionary which contains an Array, and when it has no references anymore and should get cleaned up, it just doesn't?

This sounds a lot like issue https://github.com/godotengine/godot-cpp/issues/1240 which we theoretically fixed in PR https://github.com/godotengine/godot-cpp/pull/1379 - but that's been merged for some time. Can you see if you have that fix in your version of godot-cpp? I see it in the 4.3 branch, but not 4.2, which it sounds like you're using? and 4.2 branch.

EDIT: I searched the 4.2 branch badly :-) I see that PR merged in there too, so it probably isn't due to a lack of that fix