RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.33k stars 1.26k forks source link

Need sensible exception message when spatial inertia has a NAN or INF center of mass. #22017

Open mitiguy opened 3 weeks ago

mitiguy commented 3 weeks ago

The code below creates a spatial inertia which fails in its constructor due to a call to IsPhysicallyValid(). The upstream cause of the failure is due to a NAN in the center of mass position.

  const Vector3<double> p_BoBcm_B(7, NAN, 9);
  const UnitInertia<double> G_BBo_B =  UnitInertia<double>::SolidSphere(/* radius = */ 1.0);
  SpatialInertia<double> bad_spatial_inertia(/* mass = */ 1.0, p_BoBcm_B, G_BBo_B);

However, the exception message is currently enigmatic, displayed as: "CalcPrincipalMomentsAndMaybeAxesOfInertia(): Unable to calculate eigenvalues or eigenvectors of the 3x3 matrix associated with a RotationalInertia."

A better message is needed to clearly communicate that the problem is with the center of mass position vector.

sherm1 commented 3 weeks ago

Can you explain why we need this particular check, Paul? NaNs anywhere cause problems, but in most cases we just depend on them propagating per the IEEE standard. There are a few places where we guard against NaNs because they are particularly likely to occur. Is this one of them? Why would we expect users to make a NaN COM position?

If this isn't a problem that has occurred in practice, consider waiting to see if it really needs a fix.

mitiguy commented 3 weeks ago

A position vector can be NaN or Infinity for various reasons. For example, the volume = 0 problem that lead to PR #21924 (towards issue #21929) caused a NaN position vector. This NaN position vector was used to shift inertia to center of mass -- hence a NaN inertia matrix. This NaN inertia matrix was used for eigenvalue analysis to test a triangle inequality -- which then gave a cryptic error message that was unusable by the user. We need to guard against NaNs here because they will not propagate well (an exception will be thrown a few code-steps later).

The code here safeguards against both NaN and InF being propagated into next lines of code that will throw an exception.

jwnimmer-tri commented 3 weeks ago

We need to guard against NaNs here because they will not propagate well (an exception will be thrown a few code-steps later).

Maybe that is the actual problem, though? Why does the low-level routine throw an exception, instead of returning either a NaN-filled result, or a std::expected<Result, Error>, or using a DiagnosticPolicy? Fixing the low-level routine to communicate errors properly is a single change. Guarding every upstream caller of that function with ahead-of-time checking is not a good architecture.

In any case, I think Sherm's point is that this particular error at this particular layer in the code has never been encountered by users, so why are we bothering with it? In #21924, it was as parser error. Absolutely any attempt to fix that must start from the parser design and work downwards. Trying to screw around in low-level code without any understanding of how parsers work is a total waste of our time.