RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.36k stars 1.27k forks source link

[pybind] Python bindings of pure C++ objects may lose keep_alive relationships when Python view is lost #20383

Open EricCousineau-TRI opened 1 year ago

EricCousineau-TRI commented 1 year ago

Summary

This is the core issue to #20131, with more minimal Drake example in #20380 See https://github.com/RobotLocomotion/pybind11/pull/65 for pybind-only minimal reproduction case.

If we have a pure C++ object (it is not a Python class inheriting from a C++ base), then the Python "view" of the object can get garbage collected if we pass ownership solely to C++ and the Python view goes out of scope.

This is generally not an issue, as the C++ object itself stays alive. However, this becomes a problem when the Python view of the object has additional information, such as keep_alive<> annotations - most importantly, it has patients that the Python view keeps alive.

Example

Adapted from #20131: https://github.com/RobotLocomotion/drake/blob/20a030af9b6ae4800d95081558cd447cec4e715f/tmp/repro.py#L163-L174

The sequence of events:

Related upstream issues

I have not searched for these yet.

Possible Solutions

1. Whenever we pass a shared_ptr<> from Python to C++, we still annotate it.

This is not bullet proof (we can still lose keep_alive relationships), but would at least improve things.

2. Ditch our pybind11 fork and use shared_ptr<> more (#5842)

Even if we wholesale switched to shared_ptr<> and ditched our pybind11 fork (#5842), I believe this would still be an issue unless we guarantee every C++ object shown to Python is only ever exposed as shared_ptr<> / (and possibly using enable_shared_from_this if we need to expose raw pointers).

If there is ever a raw pointer that has lifetime constraints, then that necessitates keep_alive<>, which may fall apart in situations like this.

3. When passing ownership from Python to pure C++, keep the object alive if it has any patients

This may keep things alive longer than intended, and may be slow to do the nurse-based lookup.

I am also not sure how hard this is for a pure Python object. To prevent slicing, etc., we inject our own logic into tp_dealloc for the derived class.

I need to check if the pybind11 smart_holder branch has a mechanism for this, but I doubt it (this is kinda niche / silly).

4. Never delete anything.

This seems bad?


@jwnimmer-tri @SeanCurtis-TRI @sammy-tri @RussTedrake - do y'all have suggestions / alternatives?

Version

20a030af9b6ae4800d95081558cd447cec4e715f

jwnimmer-tri commented 1 year ago

Option (2) sounds best to me. I don't think we'll need enable_shared_from_this. Remember that the "managed object" and the "pointed-to value" of a shared_ptr need not be the same. Something like a shared_ptr<Context> could still be a subcontext alias into a larger context, where the managed object is the full context and the stored pointer is the subcontext.