godotengine / godot-cpp

C++ bindings for the Godot script API
MIT License
1.74k stars 575 forks source link

Clean up instance bindings for engine singletons to prevent crash #1458

Closed dsnopek closed 6 months ago

dsnopek commented 6 months ago

Fixes https://github.com/godotengine/godot-cpp/issues/889

This specifically cleans up engine singletons, it doesn't address the general case where a GDExtension is unloaded before an Object can clean up its instance binding to that GDExtension.

It includes a test that crashes at exit without this fix, but exits normally with this fix.

mihe commented 6 months ago

I'm running into some issues with this it seems:

  1. I'm not sure if this has changed with this PR necessarily, but the workaround I've been using up until now has been thread-safe by virtue of doing the whole singleton initialization as part of the expression that populates the static variable (which are magically thread-safe as of C++11). When moving over to this change instead, that's no longer the case, since engine_singletons is not thread-safe. It might be enough to just stick a mutex around it, but it might also be a good idea to stick the singleton initialization into its own method/lambda and have that populate the singleton variable as its initial value.
  2. With this change I'm seeing an access to a stale/deleted PhysicsServer3DManager instance binding in ClassDB::deinitialize. I believe this is because Godot is freeing this on its end before ClassDB::deinitialize runs in godot-cpp. So I'm guessing you'll probably still want to do what my workaround was doing, by not allowing Godot to free any of the instance bindings itself, by providing a nullptr free callback as part of gdextension_interface_object_get_instance_binding, since they're all supposed to be freed as part of ClassDB::deinitialize on the godot-cpp side of things.
dsnopek commented 6 months ago

Thanks for testing!

  1. I'm not sure if this has changed with this PR necessarily, but the workaround I've been using up until now has been thread-safe by virtue of doing the whole singleton initialization as part of the expression that populates the static variable (which are magically thread-safe as of C++11).

Ah, I'll look into that. Multiple threads requesting the singleton at the same time when it hasn't been initialized yet does seem like a problem.

2. So I'm guessing you'll probably still want to do what my workaround was doing, by not allowing Godot to free any of the instance bindings itself, by providing a nullptr free callback as part of gdextension_interface_object_get_instance_binding, since they're all supposed to be freed as part of ClassDB::deinitialize on the godot-cpp side of things.

Hm. Freeing the instance binding should have removed it from the list of instance bindings the object had on the Godot side, so this shouldn't be a problem. When I have a chance, I'll do my own testing with Godot Jolt.

mihe commented 6 months ago

Hm. Freeing the instance binding should have removed it from the list of instance bindings the object had on the Godot side, so this shouldn't be a problem. When I have a chance, I'll do my own testing with Godot Jolt.

I can maybe take a closer look at what's happening tomorrow.

If you want to try it yourself you'll have to use this fork of Godot Jolt, which uses this fork of godot-cpp.

mihe commented 6 months ago

It does seem to be the case that the instance binding (i.e. PhysicsServer3DManager in godot-cpp) that gets created as part of the _gde_binding_create_callback is freed in _gde_binding_free_callback during Object::~Object when PhysicsServer3DManager on the Godot side is deleted, which happens before ClassDB::deinitialize in godot-cpp, thus leading to a stale pointer access in engine_singletons.

Whichever way you turn it I feel like you'll need to override the free callback for singletons. Either you give it a nullptr like I did, and you do the freeing in ClassDB::deinitialize (but that means you can technically have "valid" instance bindings that point to stale objects in Godot) or you override the free callback with a custom one that does the deleting (same as the default one) but that also unregisters the instance binding from engine_singletons. You might in that case also want to null out the singleton pointer as well, if possible.

dsnopek commented 6 months ago

@mihe:

If you want to try it yourself you'll have to use this fork of Godot Jolt, which uses this fork of godot-cpp.

Thanks! However, I'm having trouble reproducing the crash with Godot Jolt on my system. :-/ I wish that I could, because then I could test if my latest changes fix it.

or you override the free callback with a custom one that does the deleting (same as the default one) but that also unregisters the instance binding from engine_singletons. You might in that case also want to null out the singleton pointer as well, if possible.

I've attempted to take this approach in my latest push. Can you give it a try with Godot Jolt?

Note: I'm not making a custom free callback, I've added the bit that removes the instance from engine_singleton and nulls out singleton to the destructor, but the result should be the same.

mihe commented 6 months ago

Thanks! However, I'm having trouble reproducing the crash with Godot Jolt on my system. :-/ I wish that I could, because then I could test if my latest changes fix it.

Make sure you got the right branch as well, not just the fork. If you did already, maybe try with Valgrind, assuming you're on Linux. I was able to make it crash earlier when I used Application Verifier on Windows.

Can you give it a try with Godot Jolt?

I can verify that it fixes that particular crash, but now instead I'm seeing another crash, which seems to be unrelated to this PR. It seems that this line is causing some trouble, because Engine is freed before WorkerThreadPool, meaning the access of the Engine singleton in Object::~Object when freeing WorkerThreadPool is stale.

I don't quite understand why this works in Godot 4.2 though, assuming it actually does?

Note: I'm not making a custom free callback, I've added the bit that removes the instance from engine_singleton and nulls out singleton to the destructor, but the result should be the same.

Oh right, I keep convincing myself that you can't implement the destructor for Godot objects for some reason.

mihe commented 6 months ago

I don't quite understand why this works in Godot 4.2 though, assuming it actually does?

It does not, apparently! I guess I haven't run with Application Verifier for a good long while. The likelihood of that resulting in a crash must be very small, because I've never seen that manifest as a crash (i.e. without something like Application Verifier or Valgrind) in any of the hundreds of shutdowns I've done with Godot 4.2.

That should really be fixed.

dsnopek commented 6 months ago

@mihe I just made draft PR https://github.com/godotengine/godot/pull/91806 which aims to fix the new crash you discovered. Could you please test that with Godot Jolt too? Hopefully, there won't be any more crashes this uncovers :-)

mihe commented 6 months ago

Could you please test that with Godot Jolt too?

I'm on it.

As a sidenote, I noticed that this PR is tagged as cherrypick:4.1, but gdextension_object_free_instance_binding was introduced in 4.2, which I guess will pose a problem for that cherrypick.

dsnopek commented 6 months ago

@mihe Thanks for all your help testing!

YuriSizov commented 6 months ago

Hey, thanks for the fix! Can confirm that it solves the issue on my end as well.

dsnopek commented 6 months ago

Cherry-picked for 4.2 in PR https://github.com/godotengine/godot-cpp/pull/1465