RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.17k stars 1.24k forks source link

Slow IPOPT performance compared to a baseline #18292

Open aykut-tri opened 1 year ago

aykut-tri commented 1 year ago

What happened?

I think there is a problem with Drake's IPOPT solver interface based on a comparison to CasADi's interface. Using the attached script, I've formulated a symbolic position-only inverse kinematics problem for a serial planar chain for an arbitrary number of degrees of freedom (NDOF) using both Drake and CasADi. Both interfaces employ an IPOPT solver with the following specs:

                                    Name   Value                used
              acceptable_constr_viol_tol = 0.01                  yes
                          acceptable_tol = 1e-06                 yes
                         constr_viol_tol = 0.0001                yes
                   hessian_approximation = limited-memory        yes
                           linear_solver = mumps                 yes
                             print_level = 5                     yes
                 print_timing_statistics = no                    yes
                      print_user_options = yes                   yes
                                      sb = yes                   yes
                                     tol = 1e-08                 yes
This is Ipopt version 3.11.9, running with linear solver mumps.

Please note that CasADi has the capability of providing an exact Hessian through a second-order auto-diff, which further improves its performance, however that feature is disabled in this comparison by using a limited-memory BFGS approximation of the Hessian, provided by IPOPT, instead.

Running the comparison for different numbers of DOF yields pretty much the same solutions with the following solve times (IPOPT outputs are included in the attachment for reference):

NDOF Drake [s] CasADi [s]
2 0.15876412391662598 0.1640300750732422
25 0.08636665344238281 0.04445219039916992
50 0.4658639430999756 0.0995798110961914
100 3.4638431072235107 0.08368158340454102
200 26.273840188980103 0.26293277740478516

This issue is meant to report this drastic time performance difference of Drake with respect to a baseline for solving an identical problem using equivalent solver versions and configurations. The root causes of the problem are yet to be identified and should be investigated.

In addition, a minor reason for our slow performance is that Drake's default value, 1.05e-10, for the IPOPT parameters tol, `constr_viol_tol', 'acceptable_tol', and 'acceptable_constr_viol_tol' is too tight compared to the IPOPT's default values, 1e-8, 1e-4, 1e-6, and 1e-2, respectively.

cc: @hongkai-dai, @TobiaMarcucci, @RussTedrake

The test script and the outputs are available in ipopt_drake_vs_casadi.zip

Version

No response

What operating system are you using?

Ubuntu 20.04

What installation option are you using?

compiled from source code using Bazel

Relevant log output

No response

RussTedrake commented 1 year ago

Very nice! Thank you.

You say, and I confirmed by looking at the code, that you are using Drake's symbolic for this. Isn't this more a statement about the performance of symbolic (which is not intended to be fast?), and which we would not use for IK problems / trajectory optimization.

Wouldn't it be better to do something like

# Create quadratic task constraints
def constraint(q):
    # Define the end-effector position
    px_pd = 0.0
    py_pd = 0.0
    for i in range(NDOF):
        # drake
        th_pd = np.sum(q[:i+1])
        px_pd += L[i]*np.cos(th_pd)
        py_pd += L[i]*np.sin(th_pd)
    return np.array([(PX_D - px_pd)**2, (PY_D - py_pd)**2])

ctr_px = prog.AddConstraint(constraint, [0,0], [0,0])

so that Drake can use autodiff? (even then, we'll be paying overhead from pybind; so it would really be nice to do this in c++).

sherm1 commented 1 year ago

Assigning to Hongkai for disposition or investigation; please feel free to reassign.

RussTedrake commented 1 year ago

or if we really want to test the ipopt interface only, you could use https://drake.mit.edu/doxygen_cxx/classdrake_1_1symbolic_1_1_code_gen_visitor.html#a7b55a258854d77cb53a4bac286b1a8e7 to generate code for the constraint and for its derivatives (e.g.CodeGen(c), CodeGen(Jacobian(c)))

jwnimmer-tri commented 1 year ago

To be sure I'm following along...

The Cost evaluation is not happening symbolically, correct? (We don't support generic symbolic costs yet.) It looks like a simple QuadraticCost which would be evaluated in C++ directly (maybe even by IPOPT, if we just feed it the coeff matrix), no symbolics in sight.

The slow symbolic evaluation during solving happens because we have an ExpressionConstraint (with program attribute type kGenericConstraint) from the (PX_D - px_pd)**2 term (likewise for py_pd), which has a bunch of sin and cos products mixed in, and so the program is non-convex. Is that correct?

If yes, then I agree with @RussTedrake assessment. Possibly this is a feature request for "make it easier to avoid symbolic evaluation during solving" (e.g., by doing some codegen automatically under the hood), or possibly "make it more clear to users when they hold the tool wrong" (i.e., make it clear that they code they typed in is expected to be slow because they are using the "easy for prototyping" interface instead of "fast at runtime" interface)?

RussTedrake commented 1 year ago

Yes. I understood that the cost would get parsed into a non-symbolic quadratic, and the two constraints would be slow ExpressionConstraints.

aykut-tri commented 1 year ago

I see, so, this result is somewhat unsurprising on our end. I'll try, first, running the same script with code-generated constraints and derivatives, and based on the outcome, I'll write a C++ version. Thank you all!

RussTedrake commented 1 year ago

An even simpler first experiment would be to solve a convex optimization problem with both solver paths. (Because Drake converts linear constraints into a c++ implementation; no codegen required).

aykut-tri commented 1 year ago
I've run another comparison using a QP with linear and box constraints (i.e., min x^ T Q x s.t. A*x=b and lb <= x <= ub), as attached. The bounds are tight enough to make the problem infeasible such that the solver has to make more than one iteration. Based on the following results, eliminating symbolic expressions definitely makes a huge difference but Drake is still significantly slower than CasADi for larger problems. # of vars. # of iters. Drake [s] CasADi [s]
5 15 0.027465 0.025751
50 26 0.068099 0.063243
250 30 1.708558 1.349195
500 31 12.100774 8.858251
1000 38 111.466290 78.807654

The script is available here. I'll write a C++ version if we think it's worth it.

hongkai-dai commented 1 year ago

Thanks Aykut for the comparison!

I think it is better to code this in C++ so that we can profile the results.

There are some obvious issues in Drake's IpoptSolver

  1. We treat the linear constraints as if they are nonlinear. When we compute the gradient of the linear constraints lb <= A * x <= ub, instead of fetching the A matrix directly, we form an AutoDiffVecXd, evaluate this constraint with AutoDiffVecXd, and then take the gradient. This is very inefficient. We should directly fetch A matrix as the gradient without evaluating AutoDiff. This is a problem specific in IpoptSolver, but not in SnoptSolver.
  2. In every gradient evaluation, we pass in Eigen::VectorXd x, and then get the AutoDiffVecXd through math::InitializeAutoDiff(X). Namely the gradient of x is just the identity matrix. As we discussed before, this is highly inefficient. I think Russ and Jeremy have worked out the special structure to handle this type of gradient. This is a problem for all solvers inside Drake, including IpoptSolver, SnoptSolver and NloptSolver.
aykut-tri commented 1 year ago

That makes sense, thank you! I'll implement a C++ version for profiling once I'm available.

TobiaMarcucci commented 1 year ago

Thanks Aykut for setting this up!

I've two high-level questions (not necessarily directed to Aykut, but more meant to clarify what this comparison should show):

aykut-tri commented 1 year ago

Thanks, Tobia, and sorry for the belated response. To answer the first question, yes, casadi was also using symbolic expressions.

The experiment that you proposed in the second point makes a lot of sense. So, I just ran a quick test for the same QP above using casadi w/ and w/o the exact Hessian and obtained the results below. Based on this problem instance, using the exact Hessian may sometimes become computationally more expensive as the problem gets larger, probably due to the increased number of iterations to achieve better optimality.

# of vars. L-BFGS [iters & cost & s] Exact [iters & cost & s]
10 25 & 9.3 & 0.043 27 & 8.6 & 0.013
100 29 & 67.1 & 0.153 73 & 66.8 & 0.155
1000 39 & 364.8 & 79.22 96 & 358.2 & 98.08

As a more relevant example, I also solved the direct collation problem in the casadi example pack w/ and w/o the exact Hessian for different numbers of knots and obtained the following outcome. In this case, as you suggested, using the exact Hessian seems to be significantly more advantageous in terms of computational efficiency, even though the results are qualitatively equivalent. This implies that providing the second-order derivatives could be more critical than the other points that Hongkai pointed out.

# of knots L-BFGS [iters & cost & ms] Exact [iters & cost & ms]
10 18 & 4.203 & 41.27 12 & 4.203 & 9.63
100 21 & 3.620 & 180.43 18 & 3.620 & 66.59
1000 27 & 3.615 & 2470 17 & 3.615 & 814.3