Warning message for rigidly-connected composite bodies that have zero mass and a translational mobilizer or zero inertia and a rotational mobilizer.

amcastro-tri commented 2 years ago

We had a long discussion in #16992. It turns out that pretty much every error discussed in that issue stems from the same source: someone added a zero inertia (mass and/or rotational inertia) body to the model and, it is not welded to another non-zero mass body. Of course this is bad.

Proposed Solution

Wait till Finalize(), when we do know for sure the user is done creating the model, to analyze the MultibodyGraph and detect these zero inertia bodies that are either free of have a non-zero dof joint connecting them to another body.

We have MultibodyPlant::multibody_graph_ which nicely packs this information and already provides a nice set of graph queries (though we might need something else for this issue?). We can then iterate through each body with zero inertia, query the graph for its joints, and if not welded to either the world or some other non-zero mass body, throw with a nice error message.

mitiguy commented 2 years ago

@amcastro-tri Thank you for opening this issue. @rpoyner-tri I collected some notes about parsing below:

The file detail_sdf_parser_test.cc tests the following SDF mass/inertia parsing:

Link with no tag (not INERTIA) yields: mass = 1, ixx = iyy = izz = 1, ixy = ixz = iyz = 0.
Link with no tag (not INERTIAL) yields: mass = user_specified, ixx = iyy = izz = 1, ixy = ixz = iyz = 0.
Link with no tag yields: mass = 1, and moments/products of inertia as user-specified.
It is OK to have a link with zero mass and zero inertia.
It is OK to have a link with non-zero mass and zero inertia (all mass is concentrated at a point). mass = user_specified, ixx = iyy = izz = ixy = ixz = iyz = 0.
It is an ERROR to have a link with zero mass and non-zero inertia.

The file detail_urdf_parser_test.cc tests the following URDF mass/inertia parsing:

Link with no tag (not INERTIA) yields: mass = 0, ixx = iyy = izz = ixy = ixz = iyz = 0. THIS ZERO MASS/INERTIA URDF DEFAULT BEHAVIOR DIFFERS FROM SDF.
Link with no tag (not INERTIAL) yields the expected result mass = user_specified, ixx = iyy = izz = ixy = ixz = iyz = 0. THIS ZERO INERTIA URDF DEFAULT BEHAVIOR DIFFERS FROM SDF.
Link with no tag yields: mass = 0, and moments/products of inertia as user-specified. THIS ZERO MASS URDF DEFAULT BEHAVIOR DIFFERS FROM SDF. Paul: Double-check to ensure this issues an error.
It is OK to have a link with zero mass and zero inertia.
It is OK to have a link with non-zero mass and zero inertia (all mass is concentrated at a point). mass = user_specified, ixx = iyy = izz = ixy = ixz = iyz = 0.
It is an ERROR to have a link with zero mass and non-zero inertia.

amcastro-tri commented 2 years ago

@mitiguy, FYI. I believe the proposed solution in the issue would be "in addition" to SDF/URDF parsing. I believe the solution to this particular issue cannot be resolved at parsing time.

jwnimmer-tri commented 2 years ago

Full agree. However, it's plausible that the parsing defaults should additionally be improved (in cases where that's allowed by specification), or if nothing else they might influence the wording of the MbP::Finalize error message.

sherm1 commented 2 years ago

It's not clear to me how this can be addressed better during parsing -- we don't have sufficient information until parsing is done and in some cases not until we've given the user a chance to finish with API calls, which can include welds that can turn scary massless things into legitimate components.

So I agree the checks should be done in MbP::Finalize() where we have everything we need. Likely some simple heuristics applied to the topology would be a good start and prevent many problems. A more sophisticated approach we could TODO would be to calculate the outboard articulated body inertia at each mobilizer, perform the projection onto the mobilizer axes, and look for zeroes.

I'd like to see the proposed heuristics in this issue so we can haggle over those.

jwnimmer-tri commented 2 years ago

I think we all agree that the "zero inertia" checking needs to happen during Finalize, for the reasons you say.

I think it's fair to begin by confirming that the parser defaults match the specification for SDFormat and URDF. It seems like they already do, so there's nothing to fix. If the parsers were using the wrong defaults, that would have been worth fixing first.

mitiguy commented 2 years ago

It seems we all agree. I am collecting information as I go through the code. The current plan is to make the test in Multibody::Finalize(). It seems that the tests/checks will need to be made without a Context. There is more to think about and perhaps other changes to code, e.g., the current Body class does not have a way to get the default_rotational_inertia that is stored in RigidBody without a dynamic cast. No show-stoppers, but more to consider.

jwnimmer-tri commented 2 years ago

@amcastro-tri says above:

It turns out that pretty much every error discussed in that issue stems from the same source: someone added a zero inertia (mass and/or rotational inertia) body to the model and, it is not welded to another non-zero mass body. Of course this is bad.

From GDrive discussions, apparently @mitiguy thinks that this is ... not always bad? So we should print a warning about it, rather than throw an exception?

Anyway @mitiguy can you clarify the new victory condition here?

My understanding of the victory condition was that MbP::Finalize() should throw a descriptive exception anytime the model has a zero-inertia body which is not sufficiently welded. There is no gray area. Have you found a case where there is a gray area?

mitiguy commented 2 years ago

Example: Right now, @JoseBarreiros-TRI is using Drake to create a U-joint with limits. From what I understand, he is using a massless body between two revolute joints. Note: Based on current limitations in Drake, creating a massless body between two revolute joints is a good modeling solution for a U-joint with limits and/or actuation.

jwnimmer-tri commented 2 years ago

Is this what Sherm was trying to explain at https://github.com/RobotLocomotion/drake/issues/16992#issuecomment-1102858871?

Is there really no way to disambiguate that? Something like if it has an actuator attached (on both sides?) then it's fine, no warning (nor error)?

The objective is to avoid hitting Delta > 0. If we can do that with a 0% false positive rate and 5% false negative rate, I'll buy that. The important thing is to hit the common failure modes, not be 100% accurate.

jwnimmer-tri commented 2 years ago

My suggestion here would be to curate a set of test cases (SDFormat files) with errors, and work backwards on reporting those errors. Trying to work forwards from the implementation to imagined test cases risks missing out on the actual victory condition -- the quality of feedback we report to real users, using their real-world examples.

sherm1 commented 2 years ago

Yes, that's what I was saying in https://github.com/RobotLocomotion/drake/issues/16992#issuecomment-1102858871.

The only way to know for sure that a massless/inertialess body is fatal is to generate the mass matrix M and check its condition number. Unfortunately that is M(q) so can't be done until runtime (with a Context present). So we're stuck with heuristics and a desire not to prevent legitimate uses like José's -- that leads to warnings and means we can't prevent some obscure Delta > 0 errors with these pre-runtime checks. Instead we're trying to catch egregious mistakes, and at least have something in the log for a user to consider after getting some inscrutable message.

Having some actual user cases to verify victory would be great -- any idea how to acquire those?

jwnimmer-tri commented 2 years ago

Examples can be found in most of the linked issues that Alejandro closed in lieu of this one.

I'm pretty committed to a 0% false positive rate. I hope there's a way to detect the common cases (e.g., massless finger) without getting into the weeds.

mitiguy commented 2 years ago

I am fairly far along and currently breaking a larger PR into smaller pieces and pushing it to review. The solution leverages some existing code, namely: topology.CreateListOfWeldedBodies(). Since there is no Context available at the time of issuing the warning, it seems a 0% false positive rate is unlikely. I envision that this is the first of these types of somewhat indefinite warnings -- hence, why I think a C++ MessageManager might be beneficial. For example, the issuing of the warning can be delayed until a numerical problem is encountered.

sherm1 commented 2 years ago

Clarification please: by "false positive" do you mean a warning that turns out to be unnecessary? I certainly wouldn't like to see any false positive errors but I'm not sure that silence is better than a well-worded warning that would mostly be unseen until there's a problem.

sherm1 commented 2 years ago

An alternative we discussed would be to defer execution of the multibody system validator until we're issuing a fatal error, for example as part of issuing the Delta > 0 error. Assuming we have access to the necessary objects at that level (probably not at present), we could run the validator, get the result as a string and incorporate it in the error message. The messages might still be false positives at times, but the user would be desperate for ideas at that point.

jwnimmer-tri commented 2 years ago

By false positive, I mean error or warning. If Drake prints something out, it should always be ham, not spam. When users are running a valid simulation, we should not pepper them with warnings that do not reflect an acute problem.

I really think you've jumped the shark here. I'm no dynamicist, but can we really not detect a mass-less finger body at the end of an arm, without waiting for a configuration vector to appear? Isn't it invalid for any and all configuration vectors? Maybe I just don't understand the subtleties.

sherm1 commented 2 years ago

We can reliably issue an error in some cases, and a distal massless body with a moving inboard joint is one of those cases. If that is the only thing causing users trouble we could limit the checks to that and not attempt to warn about potential problems caused by internal massless bodies.

jwnimmer-tri commented 2 years ago

A survey of the linked reports to date should be authoritative on that front. I am also curious to hear what you find when once you dig into those! I know the primary one was the finger mass, but there might have been others.

If you find others, I'd also offer this trick: checking the configuration used by CreateDefaultContext should be a safe bet? If that q is faulty, we should probably throw during Finalize? What good is a system with an invalid default context? And if you need a Context object to make the math easier, I think you can just create one at the bottom of Finalize. No need to ask the user for one.

sherm1 commented 2 years ago

Mapping the runtime check back to the offending input is a research project -- heuristics involving the inputs can immediately report problems in user-relevant terms. I think we should stick with that approach for now. But we will need to decide what checks to perform. Calling them "false positives" makes them sound terrible -- Paul prefers "FYI" (as in "FYI you put an inertialess body on a revolute joint which may cause trouble"). His code MG has issued such messages for many years and saved his students (~800/year) and him a great deal of pain. Same with WorkingModel which had 12 million users.

jwnimmer-tri commented 2 years ago

I never quite responded to one thing you said above. In my opinion, it's perfectly valid in case we hit the Delta > 0 fault to include our "guesses" about causes (e.g., invalid input) in the error message. The diagnostic is not a "false positive" in my lingo -- there is definitely something wrong! -- we might not be able to tell the user exactly what it was, but we can offer some hints. I think that's fine. In general our exception messages often include tips about what is wrong, though sometimes the cause it other than what they suggest.

My sole objection is to printing to the console when nothing is in fact wrong. That is spam! We should never spam the user, it will only distract them from everything else going on.

So, if we defer any reporting to Delta > 0 time (rather than finalize), that's fine by me. Reporting errors ever earlier is nicer, but if doing that is difficult, then reporting them as they occur during simulation is only a small loss.

Note that deferring reports to Delta > 0 doesn't need any "message manager" or queue. When we hit Delta > 0, only then do we go hunting trying to find the problem. No need to collect the results ahead of time, that's just wasted computation.

jwnimmer-tri commented 2 years ago

His code MG ... Same with WorkingModel which had 12 million users ...

Here's where I think we differ. Drake is a library, it's not a program. There is no UI. There is nowhere for messages to go. It's is frequently run headless (think: simulations in the cloud). Anytime we assume that drake::log() reaches a human eyeball, we have failed. Many builds of drake don't even compile in spdlog.

There is no such thing as a warning or FYI. There are exceptions (or structured error reporting), nothing else. To think otherwise ignores the primary use case.

You can imagine we have a linter program that people run, and consume with their eyeballs. That would be a fine place for warnings. But the MbP API is a library call, not a program. Relying on drake::log() for that use case is a mistake.

jwnimmer-tri commented 2 years ago

Anyway, I think we both agree on the next step: consolidate the prior trouble reports into sample inputs, and then see where we stand. Working on code (or even me debating the implementation!) prior to doing that is not a good use of time.

sherm1 commented 2 years ago

The library/program distinction is important, thanks. @mitiguy let's gather more intelligence as Jeremy proposed:

Take a look at some actual user failure cases
Investigate delaying the FYI-type warnings until a fatal error has occurred anyway

mitiguy commented 2 years ago

There have been multiple pull requests to close this issue, including: PR #17302 -- Issue an error if a terminal composite body has a mass or inertia that will cause subsequent numerical problems. PR #17351 -- fix zero or NaN spatial inertias that are in tests that will not pass CI once this issue is closed. PR #17261 -- new internal functions to calculate default mass and check if default inertia is zero for a set of bodies. PR #17222 -- new functions that report whether a mobilizer or joint can rotate or translate. PR #17219 -- new function SpatialInertia::MakeTestCube() for creating valid SpatialInertia for testing.

amcastro-tri commented 2 years ago

Sorry, late to the party, but I figured at least I'd ask.

FYI, reading the docs on the new Joint::can_rotate/Joint::can_translate generates more questions than answers? are we sure we need these? Maybe a better solution is to have a SingleDofJoint from which revolute and prismatic joints inherit from? Again, not clear the purpose of those methods, so at this point I'm just guessing.

Also, why SpatialInertia::MakeTestCube instead of a free function drake::multibody::test::MakeTestCube()? is that a common practice in C++?

sherm1 commented 2 years ago

@mitiguy can elaborate. My understanding:

The can_rotate/can_translate methods allow for catching basic input errors in Finalize(), namely a massless terminal body that can translate and an inertialess terminal body that can rotate. The methods are not limited to 1-dof joints -- a ball joint (e.g.) can_rotate.
MakeTestCube() is intended to replace calls to now-invalid SpatialInertia() default constructor calls that were being used in test cases but now fail due to inertias full of NaNs. A free function in multibody/test does sound cleaner than having the test method clutter the SpatialInertia API.

jwnimmer-tri commented 2 years ago

A free function in multibody/test does sound cleaner than having the test method clutter the SpatialInertia API.

Users will also need to write their own unit tests, and "make a valid inertia" seems like a common case in a family of tests. I'd propose that the "give me a valid inertia" helper should be part of our installed footprint. That doesn't rule out making it into free function instead of a class-static factory function, but it does rule out placing it in a private unit-test library.

amcastro-tri commented 2 years ago

The methods are not limited to 1-dof joints -- a ball joint (e.g.) can_rotate.

I see, probably the docs could be more clear on the definition of "can rotate/translate".

mitiguy commented 2 years ago

@amcastro-tri Thanks for the suggestion. I'll update those docs accordingly with more information, e.g., related to Sherm's helpful description above.

@jwnimmer-tri, @sherm1, @amcastro-tri feel free to propose a name for the replacement of Spatialnertia::MakeTestCube(). I am fine with SpatialInertia::MakeValidInertia() and changing it now would be preferable to deprecation later.

jwnimmer-tri commented 2 years ago

For my part, the current function name is a-ok. Other names should be fine, too.

sherm1 commented 2 years ago

I like the current function name. In any case I think the name should contain Test so its narrow purpose is clear.

amcastro-tri commented 2 years ago

Looking at the method now, not sure why "experimental" or "for test only". My preferences would be:

If for test only, place it outside the class, in a free function in the proper namespace.
Otherwise, consider SpatialInertia::MakeSolidBoxSpatialInertia() (with no particular defaults). Such a methods does make more sense to live within the SpatialInertia class. We actually do have already a number of factory methods in place.

RussTedrake commented 2 years ago

As @sherm1 pointed out in #17351 -- the limitation with the decision to run this check in Finalize is that MultibodyPlant is our kinematics engine in addition to being our dynamics engine. A URDF file could be perfectly valid for kinematics analysis / motion planning even if it fails this test. Now we are increasing the burden on the users to fully specify the dynamics, even if they don't intend to use them. (I do think URDF is the only case that is badly impacted, since the defaults in SDF and MJCF are more user-friendly).

The alternatives I could imagine:

have the dynamics methods in MBP guarded by a cached version of this check. That is not as nice for the code, but more friendly for the user.
keep this check in Finalize, but offer a custom drake tag that opts in to a different set of defaults for URDF.

Have you all considered these?

mitiguy commented 2 years ago

Adding review comments from Russ: "This method is marked in its doxygen as (internal use only); I think it should not appear in the python bindings. And encouraging users to call MakeTestCube to get reasonable behavior... even in our example code, seems just wrong. Perhaps MakeTestCube needs to be changed to a function name / status that is appropriate for public consumption. Even MakeCube is ok with me.

I now see that this was discussed in https://github.com/RobotLocomotion/drake/issues/17081#issuecomment-1150236102 . But I still disagree with it as a public name that people should use; especially in example code (like the door hinge)."

In view of Alejandro's and Russ's comments, I will change MakeTestCube() to the following (for review): SpatialInertia::MakeSolidBox(T mass = T(2), T Lx = T(3), T Ly = T(3), T Lz = T(3)); For testing, this can be as short as: SpatialInertia::MakeSolidBox(); However, it is practical and useful for end-users, and its name is consistent with UnitInertia::MakeSolidBox().

I also agree that throwing an exception during Finalize seem suboptimal. As mentioned before, we might want to consider a C++ "MessageManager" that stores messages that are generated at opportune times in code, e.g., during Finalize() -- but that do not necessarily instantly throw an exception. Harvesting these messages to subsequently throw an exception can be fast and efficient.

mitiguy commented 2 years ago

From f2f conversion with @sherm1 and @SeanCurtis-TRI, from PR #17351, I will remove SpatialInertia::MakeSolidBox( .... ); in favor of an enhanced default SpatialInertia constructor SpatialInertia( InertiaValue someValue );

where InertiaValue is an enum that currently has two values, namely InertiaValue::kNaN InertiaValue::kSdf

Please let me know ASAP if you have objections.

amcastro-tri commented 2 years ago

I don't understand the proposed constructor signature. IMO SpatialInertia::MakeSolidBox(please no default values here) is perfectly reasonable.

SeanCurtis-TRI commented 2 years ago

There were several things that went into it:

We have a host of UnitInertia::SolidBox()-style methods. The introduction of SpatialInertia::MakeSolidBox() introduces some weirdness for several reasons:
- Despite the fact that there are obvious relationships between the classes UnitInertia and SpatialInertia, the change of spelling (inclusion of Make) seems like an inconsistent design choice.
- The fact that the SpatialInertia is hard-coded to not be at the center-of-mass is likewise weird.
If we were to unify the names and semantics (change MakeSolidBox() to SolidBox() to match UnitInertia), then we'd be obliged to give it meaningful parameters (mass, offset to center of mass, measure of box) so that it would actually serve the same kind of "utility" role that is served in SpatialInertia. This is essentially sugar (see the implementation in the .cc file). It is unclear that this sugar is actually helpful or needed.
If we decided to add this sugar, then we invite the question: "What about all of the other solid shapes offered in UnitInertia?"
Finally, the purpose being served is to indicate, "I can't be bothered with making a determination about mass, but I need to satisfy MbP's requirement that there be "sufficient" mass." If we had something like SpatialInertia::SolidBox() it becomes harder to distinguish between two case: "I want a solid box" and "I don't care what it is and a box is convenient".

For all those reasons, we opted for a construction argument such that the default behavior is unchanged (nan). But you can say, "just give me some alternate, non-nan default". We picked kSdf with the idea that we'll provide you the same defaults with this parameter as sdformat provides if omitted. It provides a whiff of consistency. Certainly not perfect, but it satisifes MbP's needs and doesn't introduce a novel arbitrary SpatialInertia value. And if I see SpatialInertai<T>(InertiaValue::kSdf), I won't look at that and think: "Oh, yeah, those mass properties are probably meaningful." It loudly declares, "I don't care! Just check the box for me."

sherm1 commented 2 years ago

Well said!

amcastro-tri commented 2 years ago

All of that makes sense to me. However, let me go back to the original intent of MakeTestCube(). As far as I can tell, with its default argument values, it creates a cube with reasonable (non NaN, non Zero) values. Therefore I am confused about the new requirement of InertiaValue::kNaN and InertiaValue::kSdf. I asked a specific question and I feel like we branched out into a another direction, forgive me if I am missing something or too slow.

My only complain (if any) is that I found a bit excessive to place testing code within the nice public API of SpatialInertia when probably either a local free function in the test was all that was needed. If however, we wanted to reuse it in other places, then I simply suggested a more general and truly public API. Not sure if clear.

I realize however this discussion does not belong in this issue. I only shows here given that the new API is the result of addressing this one issue. I am happy to move to a new issue to stop spamming this one.

mitiguy commented 2 years ago

@amcastro-tri -- your feedback is always appreciated. We had a f2f meeting yesterday to converge on this so we can make progress. There are a multiplicity of opinions and at some point, I wanted enough consensus to make progress.

To clarify: Whether the function name is MakeTestCube() or MakeSolidBox() or SpatialInertia(InertiaValue::kSdf) or something else, we need the function to be available on a much broader basis. The current default SpatialInertia() constructor creates an invalid spatial inertia. This function's scope is not limited to a test file. As you might see by the number of pending changes in PR #17351, we need this change to affect many files and even Python bindings. Ultimately we need to converge on something and move forward -- and this is where we are. Note: I do not fully understand your ideas on an alternative more general public API. For now, we just need an easy-to-make non-NaN SpatialInertial. Something more general could be useful -- and hopefully we have not restricted ourselves with this change.

SeanCurtis-TRI commented 2 years ago

Ah...MakeTestCube() and not MakeSolidCube().

You touched on it: the function is too specific to really belong in the public API. It was intended to facilitate testing. During the span of the PR it was pushed, vaguely in a direction that hinted at generality (MakeSolidCube()), but when viewed that way, it still raised all sorts of issues. And it was still polluting the API and namespace.

You used the key phrase, "a more general and truly public API". In my answer, I feel I directly responded to that idea. What would it take to do that? That was the first three items in the list. That's what a "general API" would've been and it's not clear that it's worth it.

We considered several "simple" alternatives:

Leave it as MakeTestCube() and simply exclude it from doxygen. It partially addressed cluttering the doxygen but didn't really address any of the other issues. It was still a public method on the class with magical semantics.
Create a test utility that would be available to all test sites (and only test sites). That would've largely addressed the previously discussed issues, but it is cumbersome. Furthermore, we felt that it's possible to do things in C++ and/or python outside of tests where you still need nominal mass properties, even if you don't care what they are. As that need existed outside of testing, we wanted the solution to apply outside of testing as well.

The enumeration upon construction did that. If you want NaN valued, you get to keep the historical default constructor. Non-specific, valid mass is provided by passing an argument. In the future, we can extend the enumeration values (e.g., kZero) all without thrashing the public API or introducing bloat in the number of methods.

jwnimmer-tri commented 2 years ago

Tamsi should not throw when it cannot solve; doing so makes it difficult to give good error message. It already has TamsiSolverResult to communicate error conditions. Add a DeltaGtZero enumerated error condition (choose a good name for it), and weave that through CalcAlpha so that MbP is informed of the DeltaGtZero condition. When MbP gets that result, it calls the ThrowIfBadMasses() helper function.

mitiguy commented 1 year ago

PR #17996 is investigating whether it makes sense for LDLT to be changed to LLT. This change is being investigated after conversations with @xuchenhan-tri, @sherm1, @hongkai-dai, and @amcastro-tri.

Background: Issue #16992 is associated with an obscure error message about a Delta > 0 failure. It seems this error can be caused when mass/inertia properties are inadvertently set to 0 and there is a dynamic singularity. I have been investigating ways to catch those errors and provide an error message that is more informative than Delta > 0.

In view of Eigen's documentation , it makes me wonder if the current LDLT solver algorithm associated allows bypassing singular articulated rigid body matrices and real numerical problems (that perhaps results in subsequent hard-to-understand Delta > 0 failures). Eigen's documentation states LDLT works if the matrix is positive semidefinite or negative semidefinite. Hence even if the matrix is singular, LDLT still works. Alternatively, the LLT solver requires strict positive definite matrices. So if there is a positive semidefinite and singular matrix, LDLT will work but LLT will not. It seems the articulated body inertia matrix should always be strictly positive definite, hence it makes sense to use LLT rather than LDLT.

From @amcastro-tri: TBH we do not have a proven reason to user LDTL over LLT. The reasons that I am very sure I thought of and why eventually I decided to go with that instead are:

Computation stability. We should always have nicely formed mass matrices, still numerical stability is something most people ignore and it ends up biting down the road.
Speed. Presumably LDLT could be faster given it does not require the computation of square roots.

That being said, those two claims should be proven. Here are my suggestions to prove them both:

Stability. Swap LDLT to LLT, push a PR, and see if CI is happy with it! that'd be a first order thing to try in order to figure out if round-off errors creep in somewhere.
Speed. We have a beautiful Cassie benchmark precisely for forward dynamics! Run benchmark before/after your commit and record the results.

RobotLocomotion / drake

Warning message for rigidly-connected composite bodies that have zero mass and a translational mobilizer or zero inertia and a rotational mobilizer. #17081

Proposed Solution