Open jwnimmer-tri opened 2 years ago
Assigning to @SeanCurtis-TRI initially for disposition (since you discovered this)
In #18809 we fixed a related issue, where the Eigen::VectorBlock
return type needed to be replaced with Eigen::Ref
. Yikes! Let's find and fix all of those cases, too!
Is there a reason that we haven't addressed this yet? It sure seems like it should be very high priority??
In https://github.com/RobotLocomotion/drake/pull/18809 we fixed a related issue, where the Eigen::VectorBlock return type needed to be replaced with Eigen::Ref. Yikes! Let's find and fix all of those cases, too!
This issue is about a const getter returning a reference to internal memory that might become invalid in case the method was called pre-finalize and then more geometries were added (parsed) after the call to the getter from Python. The return type in question is a C++ vector
(Python list
), which is handled totally differently in pybind11 that Eigen::Matrix
(Python numpy
).
I seems to me like the #18809 is about a "get mutable" accessor not allowing mutation through the returned reference. The problem there is an unwanted copy, not a dangling reference.
I don't think they are the same category of issue, and I don't think we should conflate them. On the mutation front, we should have an issue for "Ensure that all pydrake get_mutable_...
bindings have a unit test that demonstrates effective mutation", but separate from this one.
It sure seems like it should be very high priority??
This issue is somewhat low-risk, because it is only a hazard when called pre-finalize, and where the result is retained for later use after parsing more geometry into the scene.
I'm happy to make a new issue for the new error(=> #18818).
The narrow case of MbP.GetCollisionGeometriesForBody
might be low risk, but the issue seems to be about finding examples of this incorrect binding behavior throughout the codebase? There might be other erroneous instances that are higher risk?
There might be other erroneous instances that are higher risk?
That's a good point, yes. The risk from any other misuses is unknown.
https://github.com/RobotLocomotion/drake/blob/5d652399fab430f4f27a928d221a016c4b8588bd/bindings/pydrake/multibody/plant_py.cc#L782-L785
The
py_rvp::reference_internal
there seems like it refers to the function's return value (a vector) which indeed we could take as a reference. However, what it actually denotes is that eachGeometryId
is reference-aliased one by one. That's unsafe and unsound!We simply need to remove the
reference_internal
.Ideally we also try to skim for similar bugs (using a rvp on a vector-returning function), but that might be difficult. Probably the best way to find is to insert a throw-bomb in the pybind11 code (via a patch) where any non-default rvp for an array-like return value is disallowed.