Autodiffing a linear solver can give incorrect result

hongkai-dai commented 3 years ago

Eigen's linear solvers can give wrong gradient with autodiff scalar. I filed an issue to Eigen upstream https://gitlab.com/libeigen/eigen/-/issues/2313. We have discovered problems when using Eigen::LLT and Eigen::LDLT. I don't know if other Eigen solvers have the problem when computing the gradients in other use cases..

Unfortunately we call LDLT at several places inside Multibody simulation. So I am not super confident about the gradient from the simulation now.

One solution is to use LinearSolve function introduced in PR https://github.com/RobotLocomotion/drake/pull/15682. This LinearSolve() function only need the double version of the linear solvers. It computes the gradient through implicit function theorem.

rpoyner-tri commented 3 years ago

Happily, there are a fairly small number of uses of LLT or LDLT that are open to scalar type substitution.

$ git grep LDLT |grep '<T>'
multibody/plant/multibody_plant.cc:    Eigen::LDLT<MatrixX<T>> M_ldlt_;
multibody/plant/tamsi_solver.h:    Eigen::LDLT<MatrixX<T>>& mutable_J_ldlt() { return J_ldlt_; }
multibody/plant/tamsi_solver.h:    Eigen::LDLT<MatrixX<T>> J_ldlt_;
multibody/tree/articulated_body_inertia_cache.h:  const Eigen::LDLT<MatrixUpTo6<T>>& get_ldlt_D_B(
multibody/tree/articulated_body_inertia_cache.h:  Eigen::LDLT<MatrixUpTo6<T>>& get_mutable_ldlt_D_B(
multibody/tree/articulated_body_inertia_cache.h:  std::vector<Eigen::LDLT<MatrixUpTo6<T>>> ldlt_D_B_;
multibody/tree/body_node.h:      Eigen::LDLT<MatrixUpTo6<T>>& ldlt_D_B = get_mutable_ldlt_D_B(abic);
multibody/tree/body_node.h:  const Eigen::LDLT<MatrixUpTo6<T>>& get_ldlt_D_B(
multibody/tree/body_node.h:  Eigen::LDLT<MatrixUpTo6<T>>& get_mutable_ldlt_D_B(
systems/primitives/test/linear_transform_density_test.cc:  Eigen::LDLT<Matrix2<T>> ldlt_solver;

$ git grep LLT |grep '<T>'
multibody/fixed_fem/dev/inverse_spd_operator.h:  Eigen::LLT<MatrixX<T>> A_llt_;

DamrongGuoy commented 3 years ago

TIL git grep foo | grep bar. That is cool!

DamrongGuoy commented 3 years ago

I'm just curious. When @hongkai-dai said "implicit function theorem", does it mean "under a mild condition on the partial derivatives, the set of zeros of a system of equations is locally the graph of a function." (https://en.wikipedia.org/wiki/Implicit_function_theorem). Or we were talking about something else?

hongkai-dai commented 3 years ago

@DamrongGuoy that is right. Here if we can treat it as a function

f(x, z) = A(z) * x- b(z)

and we have f(x, z) = 0. Now in order to compute the gradient of ∂x/∂z we can use the implicit function theorem, and get

∂x/∂z = A⁻¹(∂b/∂z - ∂A/∂z * x)

which we implemented in the PR #15682

RussTedrake commented 3 years ago

@rpoyner-tri -- the may be few in number, but those look some potentially important codepaths!

rpoyner-tri commented 3 years ago

@RussTedrake I have no doubt of their importance. I was just trying to gauge the likely effort of fixing.

sherm1 commented 3 years ago

We can call this issue "autodifficulty"

hongkai-dai commented 3 years ago

I also did some comparison on the computation speed, between the new LinearSolve function introduced in PR #15682 and Eigen's linear solver instantiated with AutoDiffXd. Our LinearSolve function is a lot faster. Here is a comparison

Small sized (A is 2 x 2):

	LLT	LDLT	ColPivHouseholderQR	PartialPivLU
Our LinearSolve	268.2ns	191.4ns	305.1ns	187.6ns
Eigen's AutodiffScalar solver	578.5ns	757.9ns	1663.1ns	532.8ns

Medium sized (A is 6 x 6):

	LLT	LDLT	ColPivHouseholderQR	PartialPivLU
Our LinearSolve	804.1ns	832.5ns	1190.4ns	787.7ns
Eigen's AutodiffScalar solver	5242.5ns	7091.1ns	17589.3ns	5927ns

Large sized (A is 50 x 50):

	LLT	LDLT	ColPivHouseholderQR	PartialPivLU
Our LinearSolve	596μs	600μs	688μs	580μs
Eigen's AutodiffScalar solver	4710μs	3931μs	11607μs	7624μs

SeanCurtis-TRI commented 3 years ago

(Table might be easier to read if we had the same units everywhere....I almost made a stupid comment until I paid closer attention and noticed n vs u.)

sherm1 commented 3 years ago

I think the values are wrong now, Hongkai. For example the last column used to be 579μs/7624μs (a factor of 13X) but now it is only 1.3X. A decimal point got lost I think.

I was going to suggest just changing us to μs so it looks less like ns -- microseconds was a good unit for that last case.

sherm1 commented 3 years ago

Oh and those timings are awesome -- great job!

hongkai-dai commented 3 years ago

Done. I updated the table. Thanks @SeanCurtis-TRI and @sherm1

rpoyner-tri commented 3 years ago

Vague proposal for a patch: I'm thinking maybe we specialize LDLT and LLT on AutoDiffXd, and provide implementations that use Hongkai's implicit-function code. The nice property (if this works) is that we won't have to constantly troll Drake code for new uses and patch them. Plausible?

hongkai-dai commented 3 years ago

I think we can replace all of the code calling Eigen's "solve" function for a linear solver, with the LinearSolve function provided in PR #15682. The LinearSolve function in this PR is templated (work on both double matrix and autodiff matrix). It should be as fast as calling Eigen's "solve" function for the double matrix version, as a lot faster (and accurate) than calling Eigen's "solve" function for the AutoDiffScalar matrix version.

rpoyner-tri commented 3 years ago

The above proposal has one of the vices of our AutoDiffXd specialization: it can be tedious to track changes in the underlying Eigen classes. In this case, one might hope that the Eigen bug gets fixed and we could retire our specializations. OTOH, if ours are strictly faster, they may hang around for ever.

sherm1 commented 3 years ago

I like the idea of banning direct calls to Eigen's linear solvers altogether in Drake, and requiring use of Hongkai's LinearSolve interface, instantiated with the desired Eigen solver. That would translate directly to the Eigen solver in the double case but use the fast implicit function theorem to obtain the derivatives when instantiated with AutoDiffXd. This would at least be a modest step in the direction of faster autodiffing (as well as fixing the LDLT/LLT problems). A full ban would be easier to enforce than case by case.

rpoyner-tri commented 3 years ago

A ban/wrapper is an interesting idea. Sounds to me like it would need tooling support to avoid reviewer effort as the only enforcement mechanism.

rpoyner-tri commented 3 years ago

As a first experiment, here is an empty specialization that has the effect of spoiling compilation whenever LDLT and autodiff are used together.

diff --git a/common/BUILD.bazel b/common/BUILD.bazel
index 54551f94a2..c6c033fb12 100644
--- a/common/BUILD.bazel
+++ b/common/BUILD.bazel
@@ -130,6 +130,7 @@ drake_cc_library(
     name = "autodiff",
     hdrs = [
         "autodiff.h",
+        "autodiff_ldlt.h",
         "autodiff_overloads.h",
         "autodiffxd.h",
         "eigen_autodiff_types.h",
diff --git a/common/autodiff.h b/common/autodiff.h
index 6aa8322988..b3d09b49e4 100644
--- a/common/autodiff.h
+++ b/common/autodiff.h
@@ -29,5 +29,6 @@ static_assert(EIGEN_VERSION_AT_LEAST(3, 3, 4),
 #include "drake/common/eigen_autodiff_types.h"
 #include "drake/common/autodiffxd.h"
 #include "drake/common/autodiff_overloads.h"
+#include "drake/common/autodiff_ldlt.h"
 // clang-format on
 #undef DRAKE_COMMON_AUTODIFF_HEADER
diff --git a/common/autodiff_ldlt.h b/common/autodiff_ldlt.h
new file mode 100644
index 0000000000..54269d3724
--- /dev/null
+++ b/common/autodiff_ldlt.h
@@ -0,0 +1,10 @@
+#pragma once
+
+namespace Eigen {
+
+// The goal so far is to just test the seam. This should break any usage of
+// LDLT with autodiff.
+template <int _Rows, int _Cols, int _Options, int _MaxRows, int _MaxCols, int UpLo>
+class LDLT<Matrix<drake::AutoDiffXd, _Rows, _Cols, _Options, _MaxRows, _MaxCols>, UpLo> {};
+
+}  // Eigen

From there, it is a "mere matter of programming" ;) to implement the fix. On the plus side, specialization like this leverages the compiler to find the usage we want to intercept; on the minus side, we are potentially yoked to the Eigen implementation and expose ourselves to tedious maintenance tasks.

Perhaps another way to think of this template is as an implementation of the ban. We could spiff it up a bit to emit an error message that guides the client to use some Drake implementation instead.

jwnimmer-tri commented 3 years ago

My general rule of thumb is that we should specialize other projects' namespaces only as a last resort. Doing so always runs of the risk of breaking non-Drake uses of the API. What if downstream code wants to call LDLT on their own -- should we intercept that logic with our own, even for AutoDiff? What if they were relying on the prior behavior of the (broken?) AutoDiff implementation. It would be less risky if the AutoDiff type were in the Drake namespace, but so far, it's not.

rpoyner-tri commented 3 years ago

Good point. So, we're back to creating more rules for humans to enforce on Drake.

Is there a solver (even slow) that we can rely on for gradients? That might help in writing tests of code that uses something else in a fast path.

hongkai-dai commented 3 years ago

Is there a solver (even slow) that we can rely on for gradients? That might help in writing tests of code that uses something else in a fast path.

So far I found LLT and LDLT can give wrong results. BCDSVD and JacobiSVD don't support autodiffscalar yet. The other solvers pass my test that give the right gradient for small 2 x 2 matrix. But I don't know if these solvers can reliably give the right gradients for other problems.

hongkai-dai commented 3 years ago

With PR #15682 merged. Is it OK if I change some of the call on Eigen::LDLT to LinearSolve function in multibody folder?

rpoyner-tri commented 3 years ago

@hongkai-dai works for me. I was planning to do it, but other things are slowing me down just now. Happy to review or consult.

hongkai-dai commented 2 years ago

Eigen actually computes the right solution, I made a mistake when setting up the eigen test.

RobotLocomotion / drake

Autodiffing a linear solver can give incorrect result #15685