RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.24k stars 1.25k forks source link

[arm64] Undefined behavior on SpatialVelocity when running on arm64 #21706

Open cbarcelo-bdai opened 1 month ago

cbarcelo-bdai commented 1 month ago

What happened?

When Drake is run on Ubuntu 22.04 and arm64, several tests fail. While some failures due to unmet tolerances in floating point calculations, I've found some tests that fail not because of Drake's internal calculations, but because the expected values themselves are incorrect (those values manually calculated as part of the tests)

For example, this EXPECT_TRUE assertion in frame_kinematics_test fails because w_L3E_E_expected contains an incorrect value. When investigating why the calculations didn't produce the expected value, I discovered that simply printing the values used for this subtraction (right before the return statement) makes the test pass.

This not only points back to the same class an UB was previously found (here), but the code itself, although correct, looks strange. It returns whatever comes from the subtraction, but operator- gets the reference, creates a copy of one of the SpatialVelocity, calls operator-= from SpatialVelocity's base class (SpatialVector) and returns in place.

I'd appreciate feedback on whether this code looks correct or if it requires refactoring. If refactoring is needed and you have a theory on what needs to be done to have it fixed, I'd be happy to contribute.

Version

master

What operating system are you using?

Ubuntu 22.04, Other

What installation option are you using?

compiled from source code using Bazel

Relevant log output

No response

jwnimmer-tri commented 1 month ago

Interesting!

p.s. Your links are to line numbers on the master branch, which will eventually go out of date. When you're on the page with the line number, if you push y on your keyboard it will switch the URL to be a permalink. When you have a moment, please update your hyperlinks in the original to be permalinks.

jwnimmer-tri commented 1 month ago

I'd say someone got carried away with trying to delegate between operator op and operator op= similar functions. If you simplify the code, does it work better for you:

```patch diff --git a/multibody/math/spatial_vector.h b/multibody/math/spatial_vector.h index b4d43584bc..f15d39f8d1 100644 --- a/multibody/math/spatial_vector.h +++ b/multibody/math/spatial_vector.h @@ -241,7 +241,7 @@ class SpatialVector { /// SpatialVelocity, SpatialAcceleration, SpatialForce, or SpatialMomentum). friend SpatialQuantity operator+(const SpatialQuantity& V1_E, const SpatialQuantity& V2_E) { - return SpatialQuantity(V1_E) += V2_E; + return SpatialQuantity(V1_E.get_coeffs() + V2_E.get_coeffs()); } /// Subtracts two spatial vectors by simply subtracting their 6 underlying @@ -254,13 +254,13 @@ class SpatialVector { /// SpatialVelocity, SpatialAcceleration, SpatialForce, or SpatialMomentum). friend SpatialQuantity operator-(const SpatialQuantity& V1, const SpatialQuantity& V2) { - return SpatialQuantity(V1) -= V2; + return SpatialQuantity(V1.get_coeffs() - V2.get_coeffs()); } /// Multiplication of a spatial vector V from the left by a scalar `s`. /// @relates SpatialVector. friend SpatialQuantity operator*(const T& s, const SpatialQuantity& V) { - return SpatialQuantity(V) *= s; + return SpatialQuantity(V.get_coeffs() * s); } /// Multiplication of a spatial vector V from the right by a scalar `s`. diff --git a/multibody/math/spatial_velocity.h b/multibody/math/spatial_velocity.h index eee23fcb3d..be2b5b3687 100644 --- a/multibody/math/spatial_velocity.h +++ b/multibody/math/spatial_velocity.h @@ -203,9 +203,9 @@ class SpatialVelocity : public SpatialVector { template inline SpatialVelocity operator+( const SpatialVelocity& V1_E, const SpatialVelocity& V2_E) { - // Although this operator+() function simply calls an associated - // SpatialVector operator+=() function, it is needed for documentation. - return SpatialVelocity(V1_E) += V2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(V1_E, V2_E); } /// Subtracts spatial velocities by simply subtracting their 6 underlying @@ -235,9 +235,9 @@ inline SpatialVelocity operator+( template inline SpatialVelocity operator-( const SpatialVelocity& V1_E, const SpatialVelocity& V2_E) { - // Although this operator-() function simply calls an associated - // SpatialVector operator-=() function, it is needed for documentation. - return SpatialVelocity(V1_E) -= V2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator-(V1_E, V2_E); } template ```

If this seems to help, you could extend the pattern to the rest of the spatial_algebra files.

cbarcelo-bdai commented 1 month ago

If you simplify the code, does it work better for you:

Alright, so I've implemented the following changes:

Changes ```diff diff --git a/multibody/math/spatial_acceleration.h b/multibody/math/spatial_acceleration.h index eac8ee35f..eec7591e4 100644 --- a/multibody/math/spatial_acceleration.h +++ b/multibody/math/spatial_acceleration.h @@ -326,9 +326,9 @@ class SpatialAcceleration : public SpatialVector { template inline SpatialAcceleration operator+(const SpatialAcceleration& A1_E, const SpatialAcceleration& A2_E) { - // Although this operator+() function simply calls an associated - // SpatialVector operator+=() function, it is needed for documentation. - return SpatialAcceleration(A1_E) += A2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(A1_E, A2_E); } /// Subtracts spatial accelerations by simply subtracting their 6 underlying @@ -342,9 +342,9 @@ inline SpatialAcceleration operator+(const SpatialAcceleration& A1_E, template inline SpatialAcceleration operator-(const SpatialAcceleration& A1_E, const SpatialAcceleration& A2_E) { - // Although this operator-() function simply calls an associated - // SpatialVector operator-=() function, it is needed for documentation. - return SpatialAcceleration(A1_E) -= A2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator-(A1_E, A2_E); } } // namespace multibody diff --git a/multibody/math/spatial_force.h b/multibody/math/spatial_force.h index 9d8c4a598..744bedc9e 100644 --- a/multibody/math/spatial_force.h +++ b/multibody/math/spatial_force.h @@ -216,9 +216,9 @@ class SpatialForce : public SpatialVector { template inline SpatialForce operator+(const SpatialForce& F1_E, const SpatialForce& F2_E) { - // Although this operator+() function simply calls an associated - // SpatialVector operator+=() function, it is needed for documentation. - return SpatialForce(F1_E) += F2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(F1_E, F2_E); } /// Subtracts spatial forces by simply subtracting their 6 underlying elements. @@ -232,9 +232,9 @@ inline SpatialForce operator+(const SpatialForce& F1_E, template inline SpatialForce operator-(const SpatialForce& F1_E, const SpatialForce& F2_E) { - // Although this operator-() function simply calls an associated - // SpatialVector operator-=() function, it is needed for documentation. - return SpatialForce(F1_E) -= F2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator-(F1_E, F2_E); } } // namespace multibody diff --git a/multibody/math/spatial_momentum.h b/multibody/math/spatial_momentum.h index 5c6930b08..785ea7644 100644 --- a/multibody/math/spatial_momentum.h +++ b/multibody/math/spatial_momentum.h @@ -172,9 +172,9 @@ class SpatialMomentum : public SpatialVector { template inline SpatialMomentum operator+(const SpatialMomentum& L1_E, const SpatialMomentum& L2_E) { - // Although this operator+() function simply calls an associated - // SpatialVector operator+=() function, it is needed for documentation. - return SpatialMomentum(L1_E) += L2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(L1_E, L2_E); } /// Subtracts spatial momenta by simply subtracting their 6 underlying elements. @@ -186,9 +186,9 @@ inline SpatialMomentum operator+(const SpatialMomentum& L1_E, template inline SpatialMomentum operator-(const SpatialMomentum& L1_E, const SpatialMomentum& L2_E) { - // Although this operator-() function simply calls an associated - // SpatialVector operator-=() function, it is needed for documentation. - return SpatialMomentum(L1_E) -= L2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(L1_E, L2_E); } } // namespace multibody diff --git a/multibody/math/spatial_vector.h b/multibody/math/spatial_vector.h index b4d43584b..08506573f 100644 --- a/multibody/math/spatial_vector.h +++ b/multibody/math/spatial_vector.h @@ -184,7 +184,7 @@ class SpatialVector { /// absolute values in (this - other). decltype(T() < T()) IsApprox( const SpatialQuantity& other, - double tolerance = std::numeric_limits::epsilon()) const { + double tolerance = 2 * std::numeric_limits::epsilon()) const { return IsNearlyEqualWithinAbsoluteTolerance(other, tolerance, tolerance); } @@ -241,7 +241,7 @@ class SpatialVector { /// SpatialVelocity, SpatialAcceleration, SpatialForce, or SpatialMomentum). friend SpatialQuantity operator+(const SpatialQuantity& V1_E, const SpatialQuantity& V2_E) { - return SpatialQuantity(V1_E) += V2_E; + return SpatialQuantity(V1_E.get_coeffs() + V2_E.get_coeffs()); } /// Subtracts two spatial vectors by simply subtracting their 6 underlying @@ -254,13 +254,13 @@ class SpatialVector { /// SpatialVelocity, SpatialAcceleration, SpatialForce, or SpatialMomentum). friend SpatialQuantity operator-(const SpatialQuantity& V1, const SpatialQuantity& V2) { - return SpatialQuantity(V1) -= V2; + return SpatialQuantity(V1.get_coeffs() - V2.get_coeffs()); } /// Multiplication of a spatial vector V from the left by a scalar `s`. /// @relates SpatialVector. friend SpatialQuantity operator*(const T& s, const SpatialQuantity& V) { - return SpatialQuantity(V) *= s; + return SpatialQuantity(V.get_coeffs() * s); } /// Multiplication of a spatial vector V from the right by a scalar `s`. diff --git a/multibody/math/spatial_velocity.h b/multibody/math/spatial_velocity.h index eee23fcb3..be2b5b368 100644 --- a/multibody/math/spatial_velocity.h +++ b/multibody/math/spatial_velocity.h @@ -203,9 +203,9 @@ class SpatialVelocity : public SpatialVector { template inline SpatialVelocity operator+( const SpatialVelocity& V1_E, const SpatialVelocity& V2_E) { - // Although this operator+() function simply calls an associated - // SpatialVector operator+=() function, it is needed for documentation. - return SpatialVelocity(V1_E) += V2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator+(V1_E, V2_E); } /// Subtracts spatial velocities by simply subtracting their 6 underlying @@ -235,9 +235,9 @@ inline SpatialVelocity operator+( template inline SpatialVelocity operator-( const SpatialVelocity& V1_E, const SpatialVelocity& V2_E) { - // Although this operator-() function simply calls an associated - // SpatialVector operator-=() function, it is needed for documentation. - return SpatialVelocity(V1_E) -= V2_E; + // Although this implementation calls the base class operator, it is needed + // for documentation. + return SpatialVector::operator-(V1_E, V2_E); } template ```

And this is the diff on the list of failing tests I have:

Diff failing tests ```diff 1c1,3 < //bindings/pydrake/common:py/sorted_pair_pybind_test --- > //bindings/pydrake/common:py/wrap_pybind_test > //bindings/pydrake/math:py/math_example_test > //bindings/pydrake/math:py/math_overloads_test 2a5,6 > //bindings/pydrake/symbolic:py/sympy_test > //bindings/pydrake/systems:py/lifetime_test 25d28 < //multibody/math:spatial_algebra_test 30d32 < //multibody/plant:frame_kinematics_test 36a39 > //systems/sensors:rgbd_sensor_async_gl_test ```

Which is weird because although the test I was pointing to (frame_kinematics_test) and even spatial_algebra_test start passing, new failing tests appear now.

Not only that, but if I re-run the tests a second time (without a bazel clean in between) all pydrake tests fail with undefined symbol error (something that does not happen on master)

/pydrake/__init__.py", line 163, in <module>
    from . import common
ImportError: /home/cbarcelo_theaiinstitute_com/.cache/bazel/_bazel_cbarcelo_theaiinstitute_com/aa900abf41791df073d1b5a78180283d/sandbox/linux-sandbox/134019/execroot/drake/bazel-out/aarch64-opt/bin/bindings/pydrake/multibody/py/inverse_kinematics_test.runfiles/drake/bindings/pydrake/common/../../../_solib_aarch64/_U_S_Stools_Sinstall_Slibdrake_Cdrake_Ushared_Ulibrary___Utools_Sinstall_Slibdrake/libdrake.so: undefined symbol: __builtin_copysignq
jwnimmer-tri commented 1 month ago

I can't see any way that patch would be the actual cause of those new error messages. The patch seems like an improvement, so I would say please open a PR with that, we can review and merge it, and then circle back to any remaining errors after that.