RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.33k stars 1.26k forks source link

Eigen 3.4 vs drake::AutoDiffXd #15401

Closed jwnimmer-tri closed 3 years ago

jwnimmer-tri commented 3 years ago

This is a sub-task of #14968, as mentioned at https://github.com/RobotLocomotion/drake/issues/14968#issuecomment-872518767.

When using the pre-release branch of Eigen 3.4, the cassie_benchmark is failing:

CassieAutodiffFixture/AutodiffInverseDynamics    4241275 ns      4241220 ns          167 Allocs.max=38.049k Allocs.mean=38.049k Allocs.min=38.049k Allocs.stddev=0
abort due to malloc #57694 while max_num_allocations = 57693 in effect

The PR #15142 is the WIP showing the integration prototype.

Perhaps our forked copy of common/autodiffxd.h might be incompatible with Eigen 3.4?

@rpoyner-tri Could you pull that branch and see if you can root cause what's going on here?

jwnimmer-tri commented 3 years ago

Diffing the 3.4 branch vs Ubuntu 18's Eigen, this looks like the only non-style change:

--- /usr/include/eigen3/unsupported/Eigen/src/AutoDiff/AutoDiffScalar.h 2017-06-15 00:10:20.000000000 -0700
+++ AutoDiffScalar.h-3.4-current    2021-07-20 22:40:16.300396337 -0700
@@ -453,6 +453,24 @@
   void operator+() const;
 };

+template<typename BinOp, typename A, typename B, typename RefType>
+void make_coherent_expression(CwiseBinaryOp<BinOp,A,B> xpr, const RefType &ref)
+{
+  make_coherent(xpr.const_cast_derived().lhs(), ref);
+  make_coherent(xpr.const_cast_derived().rhs(), ref);
+}
+
+template<typename UnaryOp, typename A, typename RefType>
+void make_coherent_expression(const CwiseUnaryOp<UnaryOp,A> &xpr, const RefType &ref)
+{
+  make_coherent(xpr.nestedExpression().const_cast_derived(), ref);
+}
+
+// needed for compilation only
+template<typename UnaryOp, typename A, typename RefType>
+void make_coherent_expression(const CwiseNullaryOp<UnaryOp,A> &, const RefType &)
+{}
+
 template<typename A_Scalar, int A_Rows, int A_Cols, int A_Options, int A_MaxRows, int A_MaxCols, typename B>
 struct make_coherent_impl<Matrix<A_Scalar, A_Rows, A_Cols, A_Options, A_MaxRows, A_MaxCols>, B> {
   typedef Matrix<A_Scalar, A_Rows, A_Cols, A_Options, A_MaxRows, A_MaxCols> A;
@@ -462,6 +480,10 @@
       a.resize(b.size());
       a.setZero();
     }
+    else if (B::SizeAtCompileTime==Dynamic && a.size()!=0 && b.size()==0)
+    {
+      make_coherent_expression(b,a);
+    }
   }
 };

@@ -474,13 +496,17 @@
       b.resize(a.size());
       b.setZero();
     }
+    else if (A::SizeAtCompileTime==Dynamic && b.size()!=0 && a.size()==0)
+    {
+      make_coherent_expression(a,b);
+    }
   }
 };

 template<typename A_Scalar, int A_Rows, int A_Cols, int A_Options, int A_MaxRows, int A_MaxCols,
          typename B_Scalar, int B_Rows, int B_Cols, int B_Options, int B_MaxRows, int B_MaxCols>
 struct make_coherent_impl<Matrix<A_Scalar, A_Rows, A_Cols, A_Options, A_MaxRows, A_MaxCols>,
-                             Matrix<B_Scalar, B_Rows, B_Cols, B_Options, B_MaxRows, B_MaxCols> > {
+                          Matrix<B_Scalar, B_Rows, B_Cols, B_Options, B_MaxRows, B_MaxCols> > {
   typedef Matrix<A_Scalar, A_Rows, A_Cols, A_Options, A_MaxRows, A_MaxCols> A;
   typedef Matrix<B_Scalar, B_Rows, B_Cols, B_Options, B_MaxRows, B_MaxCols> B;
   static void run(A& a, B& b) {

Looks like we might need to either go swat down some more of the const_casts with more specializations, or withdraw our 3.3-specific specializations.

jwnimmer-tri commented 3 years ago

Or possibly https://github.com/RobotLocomotion/drake/issues/15298#issuecomment-883932523 is part of it as well.

rpoyner-tri commented 3 years ago

FWIW, I think I did do some experiments with unspecialized Eigen autodiff around about 3.3.8 or 3.3.9. At that time the longstanding make_coherent bugs had been fixed. I don't recall doing any performance comparisons though.

Here's what I wrote at that time: https://github.com/RobotLocomotion/drake/issues/13902#issuecomment-708585775 . Pity I didn't detail the "other bugs we care about" bit.

jwnimmer-tri commented 3 years ago

As I've knocked down other bugs on the integration branch, I think the only problem here might be that the malloc counts have increased past the watermarks you'd previously set.

I guess I could see what the new watermark would be, and if it's a small change, then just bump our limit in the test?

jwnimmer-tri commented 3 years ago

The only required change:

--- a/examples/multibody/cassie_benchmark/test/cassie_bench.cc
+++ b/examples/multibody/cassie_benchmark/test/cassie_bench.cc
@@ -283,7 +283,7 @@ BENCHMARK_F(CassieAutodiffFixture, AutodiffForwardDynamics)

   for (int k = 0; k < 3; k++) {
     // @see LimitMalloc note above.
-    LimitMalloc guard(LimitReleaseOnly(57693));
+    LimitMalloc guard(LimitReleaseOnly(57787));

     compute();
rpoyner-tri commented 3 years ago

Given that we've essentially declared bankruptcy on further heap reductions in these cases, I think accepting a ~1% increase here is not a big deal.