RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.34k stars 1.26k forks source link

ExtractDoubleOrThrow(AutoDiffXd) should act like DiscardZeroGradient #12650

Closed RussTedrake closed 3 years ago

RussTedrake commented 4 years ago

Currently it is silently throwing away gradient information

/// Returns the autodiff scalar's value() as a double.  Never throws.
/// Overloads ExtractDoubleOrThrow from common/extract_double.h.
template <typename DerType>
double ExtractDoubleOrThrow(const Eigen::AutoDiffScalar<DerType>& scalar) {
  return static_cast<double>(scalar.value());
}

I did NOT expect that behavior (especially given the name OrThrow). DiscardZeroGradient will cast if the gradient is zero, but throw if information is going to be lost. I propose that we tighten the semantics of that method and see if anyone was actually (properly) using the idea of intentionally discarding non-zero gradients.

jwnimmer-tri commented 4 years ago

I think both semantics are useful, and we should just be careful which one is being used for which call sites.

Reporting on the value like ...

template <typename T>
double Simulator<T>::get_actual_realtime_rate() const {
  const double simtime_now = ExtractDoubleOrThrow(get_context().get_time());
  const double simtime_passed = simtime_now - initial_simtime_;
  const Duration realtime_passed = Clock::now() - initial_realtime_;
  const double rate = (simtime_passed / realtime_passed.count());
  return rate;
}

... does not care about the derivatives.

Maybe that means we need better docs or names, but I don't think we can get rid of the option to discard.

(I don't think we should use DiscardZeroGradient(..., precision = inf) as the "always discard" spelling.)

SeanCurtis-TRI commented 4 years ago

geometry, as an example, depends on the ability to extract the value and discard arbitrary derivatives as the mechanism we can glue double-only components into AutoDiffXd systems. A prime example of that would be bounding volume hierarchy (BVH) culling. I just need the poses of objects to classify which actually needs math done -- I don't need the derivatives of those poses to do the classification, just the math.

RussTedrake commented 4 years ago

@SeanCurtis-TRI Good example. But even in the geometry case, we are either 1) guarding carefully that the resulting computation never returns an autodiff that should have depended on those gradients, or 2) knowingly losing information and documenting(?) that gradients will be wrong.

@jwnimmer-tri - precision=inf definitely did come to mind... :)

SeanCurtis-TRI commented 4 years ago

In this case, we're not discarding the derivatives from any returned value. We cull based on double-valued poses. But when we go to actually compute the actual quantity, we use T-valued poses (where possible). The goal is to increase the domain of "where possible" to include all proximity queries. But no derivatives are lost in the answer by use of this function.

jwnimmer-tri commented 4 years ago

For BVH (which is like many other cases of ExtractDouble) -- we are using the .value() in branching to decide whether or not to consider some case, so the derivatives serve no purpose. The domain is partitioned, not smooth -- conditionals, not algebra. We couldn't really use them even if we wanted to.

The intent of the function is meant to be as-if we could write x.value() for some T x; -- to just access the core double value, discarding any metadata. Since we don't have extension methods in C++ yet, we have to write value(x) instead, so we needed to use a much longer name, in this case ExtractDoubleOrThrow.

jwnimmer-tri commented 4 years ago

It looks like we've arrived at a naming question (and whether it is one function with parameter(s), or two different functions). To help inform that discussion, we should probably find several example uses of both semantics.

We have many example of ExtractDoubleOrThrow that seem semantically correct -- where we are doing branching. I wonder if we can find some places where DiscardZeroGradient is wanted, other than the one place in RigidBodyTree that currently uses it. Can we find any of the ExtractDoubleOrThrow that would be better as DiscardZeroGradient?

jwnimmer-tri commented 4 years ago

Maybe TrajectoryAffineSystem methods for A(t), B(t), etc. should be DiscardZeroGradient?

sherm1 commented 3 years ago

I think we should close this issue. There are good arguments made above for the current behavior of ExtractDoubleOrThrow() applied to an AutoDiffXd. An additional one is that it is very cheap, while scanning the gradient for zeroes is much more expensive. The method DiscardZeroGradient() is already available with the desired semantics -- we should use that when it is suitable. #15615 will clean up the other autodiff utility functions and documentation and hopefully make the whole mess somewhat less confusing.

WDYT now, @RussTedrake ?

jwnimmer-tri commented 3 years ago

Given the above discussions, it seems like the distinction between the two functions was confusing enough that we should at least cross-reference the two functions in Doxygen (and explain their distinction) before we close this issue.

sherm1 commented 3 years ago

Yes, good point. I'll do that in #15615 along with DiscardGradient() which is also a near-miss.

jwnimmer-tri commented 3 years ago

That PR is already a bajillion lines -- I'm nearly to the point of refusing the review and insisting your break it up (one PR with the deprecations of dead code, one PR to add the new functions and port one or two folders, one+ PRs to port the balance of the tree). Adding even more novelty to what should be a boilerplate PR is not going to help anyone.

sherm1 commented 3 years ago

The vast majority of the code changes are the call sites. I could leave those for later but including them is what made it obvious the new names were too long. Happy to defer most of those if you think it will help you but I want to sort out the new signatures, documentation, and cross-references in one PR.

jwnimmer-tri commented 3 years ago

My proposal in my prior post was:

(1) One PR with the deprecation of dead code. (2) One PR to add the new functions and port one or two folders. (3) One+ PRs to port the balance of the tree.

The "the new signatures, documentation, and cross-references in one PR" would all be part of (2). I only got though about 50% of the PR so far, and it was pretty clear after only a couple folders alone that the names were too long.

Pulling out the dead code deprecations into (1) on their would be a kindness to the next release manager.

By saving the huge boilerplate changes to (3), it also would allow for selecting more folder-specific feature reviewers. I often found myself trying to imagine what the folder's nominal owner would say about the change. Easier if we just ask them, rather than make me guess. Alternatively, we could keep the boilerplate in the same PR as the new functions, but portion out review by-folder into different parties (both feature and platform).

jwnimmer-tri commented 3 years ago

Jeremy writes:

Given the above discussions, it seems like the distinction between the two functions was confusing enough that we should at least cross-reference the two functions in Doxygen (and explain their distinction) before we close this issue.

Sherm writes:

Yes, good point. I'll do that ...

As such, let's reassign to +@sherm1 -@jwnimmer-tri.

sherm1 commented 3 years ago

15699 added the cross-references mentioned above. Closing this now.