RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.18k stars 1.24k forks source link

Consider downgrading SNOPT #17609

Closed RussTedrake closed 12 months ago

RussTedrake commented 1 year ago

Many of us have the overwhelming sense that the newer versions of SNOPT perform worse on the problem instances that we have in Drake. I've discussed with other colleagues in the community who agree (notably, some of them have downgraded and are now happier; I will see if I can get them to comment on the version numbers). Anecdotally, students in my classes have had more trouble getting nonlinear trajectory optimization to work than in the past.

I'd like think about (1) what would be involved in downgrading, and (2) if we can perform any reasonable experiments to evaluate the merits.

Re (1): I know that the API moved forward in SNOPT 7.5 (#7984), and also remember that there were frustrating differences in between the fortran versions and the f2c versions of SNOPT in a handful of intermediate releases. Currently, we build only from fortran (#10422).

Re (2): We have a handful of benchmarks in Drake master; the acrobot swingup had a way of failing when we upgraded snopt, and I had to tweak the formulation to make it pass. But I think the perching and the littledog notebooks in underactuated might be some of the best benchmarks I have handy for evaluating snopt performance.

cc @aykut-tri , @hongkai-dai

jwnimmer-tri commented 1 year ago

We actually already pin a less-that-newest version (currently SNOPT 7.4-1.1 2015-02-26) by default, because newer than that caused trouble. To confirm, are you concerns applicable to that version as well? If so, do you happen to know which prior version number is thought to be better?

RussTedrake commented 1 year ago

Yes, my concerns are about the performance of the version currently pegged in master. I've asked my colleagues about exactly which versions they are happy with and will follow up here soon.

josephlmoore commented 1 year ago

We have been using 7.2-12, which I think was the version I started with way back in my RLG days. We attempted to switch to 7.7.0, which resulted in convergence issues. Trajectories that were feasible using 7.2-12 were unable to be optimized to feasibility/optimality with 7.7.0. If I remember correctly, we observed the solver "get stuck" executing a large number of minor iterations while attempting to generate a feasible trajectory. Our original motivation for the switch was to be able to access the new time-based "breakout" feature of 7.7. However, we realized that the breakout feature was implemented the same way as the custom method we employed in a modified 7.2-12 interface (breakout during start of major iteration), so there really was no benefit to use 7.7.0 at all. We didn't investigate the convergence issues too deeply; we made the switch during the middle of an intense push to demonstrate hardware results and simply reverted out of an abundance of caution.

aykut-tri commented 1 year ago

I'd like to run a 7.4 vs. 7.2 comparison (unless there is a more promising baseline version). However, it seems we have access to only 7.4, 7.5, and 7.6. Any pointers for getting the source for 7.2?

jwnimmer-tri commented 1 year ago

The RobotLocomotion/snopt repository has a long git history. It looks like if you roll back to its very first commit from 2013, that is 7.2-12 according to its README.WhatsNew.

RussTedrake commented 1 year ago

You can also see when/how the upgrade happened here: https://github.com/RobotLocomotion/drake/issues/1647

aykut-tri commented 1 year ago

I see, thank you both for the quick response!

aykut-tri commented 1 year ago

I've finally managed to get SNOPT 7.2 working with the most recent Drake. You can see the changes in this draft PR.

I've run a few experiments to compare the two versions:

  1. the iiwa relaxed position IK benchmark and a cc executable version of this example for profiling,
  2. the pendulum, acrobot, and cart-pole dircol examples in https://github.com/RussTedrake/underactuated/blob/master/trajopt/dircol.ipynb, and
  3. the little dog example of underactuated.

The only proper time comparison comes from the benchmark test that yields the following results, which indicates 7.2 is 42% faster for this feasibility problem.

Version Time [ns] CPU [ns] Iterations
7.4 12479012 12478033 57
7.2 7234173 7222591 90

The following table summarizes the results for the trajectory optimization examples in Python where the time is measured roughly using the time module around Solve.

Problem 7.2 Time [s] 7.4 Time [s] 7.2 Feasibility 7.4 Feasibility 7.2 Optimality 7.4 Optimality 7.2 Major Iterations 7.4 Major Iterations 7.2 Total Iterations 7.4 Total Iterations
pendulum 0.025 0.052 (9.2E-12) (6.8E-09) (1.1E-06) (1.5E-06) 68 34 176 94
acrobot 0.060 0.065 (5.7E-09) (2.3E-07) (6.9E-07) (1.2E-06) 43 28 229 204
cart-pole 0.309 0.707 (2.7E-12) (8.4E-13) (1.6E-06) (1.6E-06) 27 64 81 129
little dog 18.945 23.313 (1.8E-06) (2.8E-06) 2.0E-03 (5.2E-07) 200 118 13558 24713

The solutions look similar in all examples. For the little dog, the solver stops at the max. number of major iterations (set at 200 for this example) when using 7.2 but it still reaches a feasible solution, whereas 7.4 reaches a local optimum within the same budget. But nevertheless, it's noteworthy that the total number of iterations for 7.2 is almost half of that for 7.4. More detailed solver outputs for each condition are included in the attachment.

Overall, 7.2 seems to be significantly faster than 7.4, especially for the largest problem of the set, i.e., the little dog. I had a quick look through the iteration details to try to understand the root cause of this discrepancy, with no luck so far.

According to the release notes, the important changes in the newer version are:

A suspicion I had was differences between default parameters, which are included in the attachment as well for each solver. The parameter changes from 7.2 to 7.4 that I've noticed are: (i) the elastic weight increasing from 1e4 to 1e5 and (ii) the violation limit decreasing from 1e6 to 1e1. I did a quick test for the pendulum by setting these two parameters to the values for 7.4 when using 7.2, however, it didn't change the results significantly.

So, I still don't know the reason but, @RussTedrake, it seems like you were right in your insight.

I'd be happy to hear what y'all think and run further experiments to investigate the root cause.

Here are the more detailed SNOPT summaries for each case: snopt_7_4_pendulum.txt snopt_7_2_acrobot.txt snopt_7_2_cartpole.txt snopt_7_2_little_dog.txt snopt_7_2_pendulum.txt snopt_7_4_acrobot.txt snopt_7_4_cartpole.txt snopt_7_4_little_dog.txt and the default parameter values per version: snopt_7_2_params.txt snopt_7_4_params.txt

I'm also attaching the callgrind profiling outputs for the iiwa relaxed position IK example in case you're interested.

RussTedrake commented 1 year ago

This is great! I think one of the best examples I have handy that pushes on the scale is the littledog notebook here. Is it ok to have it in python?

I'd be very interested to see the snopt performance for the various gaits. But I'd be even more interested in the robustness of the solver when I start trying to tune the objectives/constraints. The latter is admittedly hard to test.

aykut-tri commented 1 year ago

I think it is okay to work in python when comparing only the SNOPT outputs, and not taking the timing results so seriously.

Yeah, the little dog example is great. In fact, I already tried it and included in the previous results, but only for the walking trot. I've just tried the other gaits w/ and w/o the following solver customizations in that notebook:

prog.SetSolverOption(snopt, "Iterations Limits", 1e5)
prog.SetSolverOption(snopt, "Major Iterations Limit", 200)
prog.SetSolverOption(snopt, "Major Feasibility Tolerance", 5e-6)
prog.SetSolverOption(snopt, "Major Optimality Tolerance", 1e-4)
prog.SetSolverOption(snopt, "Superbasics limit", 2000)
prog.SetSolverOption(snopt, "Linesearch tolerance", 0.9)
and here are the results (the parentheses indicate the satisfaction of the convergence tolerance): Problem 7.2 Success 7.4 Success 7.2 Time [s] 7.4 Time [s] 7.2 Feasibility 7.4 Feasibility 7.2 Optimality 7.4 Optimality 7.2 Major Iterations 7.4 Major Iterations 7.2 Total Iterations 7.4 Total Iterations
walking trot (custom) N Y 18.9 23.3 (1.8E-06) (2.8E-06) 2.0E-03 (5.2E-07) 200 118 13558 24713
walking trot N Y 6.1 30.9 5.6E-02 (1.1E-10) 6.0E-01 (3.2E-11) 9 269 15550 7333
running trot (custom) N Y 18.9 22.6 (1.8E-06) (2.8E-06) 2.0E-03 (5.2E-07) 200 118 13558 24713
running trot Y N 46.4 7.7 (1.6E-09) 6.2E-02 (2.0E-06) 5.6E-01 633 10 14663 15010
rotary gallop (custom) N N 105.2 37.5 3.7E-04 1.0E+00 3.9E-03 1.1E-04 200 8 56111 44182
rotary gallop N N 33.4 21.3 1.7E-01 9.7E-01 3.7E-02 1.7E-01 16 4 29480 29480
bound (custom) N Y 93.9 126.9 3.0E-05 (3.7E-07) 8.5E-04 (1.6E-07) 2 150 65922 73764
bound N N 18.8 22.9 4.6E-01 2.8E-01 3.5E-01 1.1E-01 61 6 29450 29450

It's hard to draw a conclusion from these results but, overall, 7.4 has a higher success rate, i.e., 4/8 vs. 1/8. However, it's noteworthy that they have the same success rate of 1/4 when using the default solver settings. 7.2 and 7.4 can solve the running and walking trot gaits, respectively, with the default settings. I feel like tuning the solver settings considering the 7.2 performance may turn the scales.

Regarding robustness, @hongkai-dai and I thought the sensitivity to the initial guess could be a good indicator. So, I compared the performance for the direct collocation examples w/ and w/o the default initial guess which is a linear interpolation from the start to the goal state.

Problem 7.2 Success 7.4 Success 7.2 Time [s] 7.4 Time [s] 7.2 Feasibility 7.4 Feasibility 7.2 Optimality 7.4 Optimality 7.2 Major Iterations 7.4 Major Iterations 7.2 Total Iterations 7.4 Total Iterations
pendulum Y Y 0.025 0.052 (9.2E-12) (6.8E-09) (1.1E-06) (1.5E-06) 68 34 176 94
pendulum (no seed) Y Y 0.027 0.059 (3.8E-11) (9.4E-07) (1.3E-11) (1.2E-06) 68 59 147 137
acrobot Y Y 0.060 0.065 (5.7E-09) (2.3E-07) (6.9E-07) (1.2E-06) 43 28 229 204
acrobot (no seed) Y N 0.127 0.102 (2.5E-07) 2.9E-01 (8.9E-07) (3.2E-07) 89 46 567 896
cart-pole Y Y 0.309 0.707 (2.7E-12) (8.4E-13) (1.6E-06) (1.6E-06) 27 64 81 129
cart-pole (no seed) Y Y 0.602 1.595 (1.0E-11) (9.1E-11) (7.2E-07) (1.0E-06) 49 160 90 194

Based on these rough timings, 7.2 seems generally faster than 7.4, except for the acrobot case with no seed where it succeeds as opposed to 7.4.

Since the acrobot example seems to be sensitive to the initial guess, I further experimented with it by: (i) scaling the initial trajectory duration (which is nominally 4 s) to the 1/3rd and the triple, (ii) using an initial guess with a sinusoidal joint 1 position oscillating within [-pi, pi] rad with a frequency of 1 Hz, (iii) a random position trajectory uniformly sampled for both joints from [-pi, pi] rad, and (iv) using different numbers of knots (i.e., 11 and 31 vs. the nominal 21).

Problem 7.2 Success 7.4 Success 7.2 Time [s] 7.4 Time [s] 7.2 Feasibility 7.4 Feasibility 7.2 Optimality 7.4 Optimality 7.2 Major Iterations 7.4 Major Iterations 7.2 Total Iterations 7.4 Total Iterations
default Y Y 0.060 0.065 (5.7E-09) (2.3E-07) (6.9E-07) (1.2E-06) 43 28 229 204
no seed Y N 0.127 0.102 (2.5E-07) 2.9E-01 (8.9E-07) (3.2E-07) 89 46 567 896
tf = 4/3 s Y Y 0.104 0.077 (6.2E-09) (1.0E-07) (1.2E-06) (1.2E-06) 80 33 642 205
tf = 4*3 s Y N 0.064 0.071 (4.9E-09) 9.7E-03 (1.1E-06) (8.8E-07) 62 34 291 208
sine wave N N 0.086 0.074 6.5E-02 1.7E-01 (5.1E-07) (6.9E-07) 1 33 477 400
random Y N 0.090 0.078 (2.0E-08) 1.1E-01 (8.5E-07) (1.9E-06) 66 43 322 288
N=11 N N 0.024 0.028 1.9E-01 4.1E-01 (7.5E-16) (3.1E-07) 2 15 141 119
N=31 Y Y 0.090 0.144 (7.7E-09) (6.3E-08) (1.2E-06) (1.9E-06) 53 58 262 263

Based on these data, 7.2 looks more robust by yielding a success rate of 6/8 over 3/8 for 7.4. 7.2 succeeds for the zero initial guess, the longer initial trajectory duration, and the random initial guess while 7.4 cannot.

So, after this round of analysis, 7.2 still looks promising. Please let me know your thoughts and further experiment ideas.

Here are more detailed SNOPT outputs and initial trajectories if you're interested.

aykut-tri commented 1 year ago

@jwnimmer-tri, what would be your thoughts on changing the default version to 7.2 and making 7.4 optional?

jwnimmer-tri commented 1 year ago

My main concern would be that approximately zero people have access to the 7.2 source code (not even TRI has it), so switching to 7.2 would effectively render SnoptSolver unusable for anyone who builds from source.

aykut-tri commented 1 year ago

Got it, that's a fair concern.

jwnimmer-tri commented 1 year ago

I suppose the way around that would be to support all of 7.2, 7.4, or 7.6 in our snopt_solver.cc wrapper, along the lines of how we currently support either 7.4 or 7.6 with some wrapper functions.

Also, FYI, my secondary concern is that I seem to recall 7.2 not being threadsafe. There is a test case SnoptSolverTest.MultiThreadTest that helps check for that. We'd want to get that test case passing in Drake CI before agreeing to switch to 7.2

aykut-tri commented 1 year ago

Yeah, I think we should support them all in the adapter.

I guess you're right about thread safety as well b/c SnoptSolverTest.MultiThreadTest fails for 7.2.

RussTedrake commented 12 months ago

Based on a long discussion with @jwnimmer-tri about the maintenance cost of continued support for an old version, we've decided not to do the downgrade.