RoboticExplorationLab / Altro.jl

MIT License
138 stars 41 forks source link

Sluggish ALTROSolver function #30

Open MAminSFV opened 2 years ago

MAminSFV commented 2 years ago

It was odd that this command takes a long time to execute fully and make the solver object. I was wondering if there is a way to make this command to be executed faster. @bjack205 What are the critical factors determining the speed of this command?

Many thanks,

MAminSFV commented 2 years ago

In my case, it took about 1 hour and 15 min to run the ALTROSolver(prob) I really appreciate the help

bjack205 commented 2 years ago

Can you provide some more details on your problem setup? What are the dimensions of your problem? I've never had a solver take that long to initialize.

MAminSFV commented 2 years ago

@bjack205 sure, it is high dimensional problem, with a 65 dimensional state vector and a 24 dimensional control vector.

I am actually trying to write the updated version of your the RA-L code and extend it. with a rigid body load.

The original implementation was in the early version of TrajectoryOptimization.jl and it ran quite fast, but had to use the newer version of RobotDynamics.jl for time-varying models.

Now, I believe my nested (quadrotors and load inside the batch model) dynamics model takes a lot of allocations. (about 35285 allocations for JacobianCache(model) which took a long time) . The original implementation of the paper, dynamics used be evaluated in place with dynamics!() which might have been a notable difference.

I have also used SVectors for states and controls.

Do you think maybe I can reduce these allocations using other data types for my dynamics model?

The ALTROSolver() took and hour and 15min to finish and took 35.4 M allocations :)

Thank you for your fast reply.

bjack205 commented 2 years ago

I pushed changes to both TrajectoryOptimization/master and Altro/fast-init branches that should help substantially with the time to initialize the solver the first time around. If you are consistently using the same problem dimensions (or have a small, finite set of problem variations) you might want to try out creating a sysimage with PackageCompiler.jl to further decrease initial compilation time. I still need to finish testing the changes before releasing new patch releases, but using the branches I mentioned should be enough for you for now.

Btw, with 100 trajectory segments, it only took about 2 minutes to initialize before, and now it takes about 8 seconds. I'm not sure how it was taking over an hour for you unless you're running on a computer with very limited resources or have an extremely high number of segments.

MAminSFV commented 2 years ago

Thank you for the updates. I will try the newer versions and let you know here and share the @btime results.

Yeah, I am also confused about this. My computer has a core-i7(10th gen), SSD, and GTX2060, and during the initialization, only 3-4% of the CPU was used.

By trajectory segments, do you mean the number of Knotpoints? In my initial tests, I used only 51 knot points. My best guess is that the dynamics model and the fact that it's a nested model. My main model calls Rigidbody dynamic models inside, the quadrotors, and a rigid body load. I suppose passing finitediff and initializing matrices takes a long time to propagate through it.

But I guess I am scratching the surface of this :)

bjack205 commented 2 years ago

Just a quick note so we're on the same page, your issue is with initializing the solver? Or the solve in general? Solver initialization should be independent of the dynamics implementation, since it's only allocating memory based on the state dimension, control dimension, and horizon (number of knotpoints).

If 1st time initialization is the issue, @btime won't capture this since runs it multiple times and reports the minimum time.

Also, as a general comment, DDP-based algorithms like ALTRO are ill-suited to deal with system with lots of sparsity in the dynamics. The team-lift paper we wrote a while back avoided this issue using some relatively naive approaches of splitting the problem into smaller, dense problems over each of the individual dynamics.

Also, it might be worth special-casing your dynamics Jacobian using the chain rule and using ForwardDiff (usually much faster than FiniteDiff.jl) for the individual models. For large problems, the computation time is often dominated by the time to calculate the dynamics Jacobians, so make sure that is dialed in.

MAminSFV commented 2 years ago

I just tested the updates, and ALTROSolver() got 25x faster, which is fantastic! (from 1 hour and 15min to 3 min). Thank you for those improvements!

At first, my problem was with ALTROSolver(). However, now, I am stuck with the Altro.solve!(solver) command. :) I think the same issue continues in this step as well. The solver gets stuck and slowly eats up the ram to the point that I don't have enough ram. (after 2 hours filling 10Gb)

I have put verbose=2 to see if the solver iterations have started or not, and there is no output from the logger. So it seems it is stuck in the early steps of initializing solve!().

I also tried to test with 2 knot points to see if I could get through the solution without overloading my RAM, but it seems the number of Knot-points doesn't affect the issue.

I am now trying to reproduce the batch problem of that paper with the newest versions of the packages, and interestingly the batch problem was fast to compute with TrajectoryOptimization.jl version 0.1.0 .

Regarding special-casing to ForwadDiff, I suppose that's the default option, isn't it? I saw that FiniteDiff is used in RobotDynamics.jl and ALtro to make JacobianCache(model), which was one of the steps that I found to be time-consuming.

I hope we will fix this issue; I might have to revert to version 0.1 and write the time-varying models from RobotDynamics to the older version.

Please let me know what you think and what the issue could be. I can also share my files in a repo or zip if you think that can help with understanding the issue better.

bjack205 commented 2 years ago

The biggest difference with the new version is that it uses StaticArrays throughout (usually SizedArrays, which simply wrap a normal array in a fixed size). This gives massive wins for small to medium-sized problems but can lead to large precompilation times for larger sizes. Apparently I haven't successfully eliminated instances where it tried to convert a SizedArray to an SArray, which are extremely slow to precompile for large arrays.

A couple things to try:

  1. Turn off the static backward pass (I should make this a default for problems over a given size)
  2. Run all of your dynamics beforehand (evaluate your discrete dynamics and discrete dynamics Jacobian)

As for the dynamics Jacobians, the default is ForwardDiff, but it will diff through the batch dynamics, which is wasteful when you have lots of sparsity in your dynamics. Instead, it'll probably be faster to diff through each of the models and then use the chain rule to get the batch dynamics Jacobian.

I don't have a lot of free time to address this, but if you are able to find the bottlenecks for compilation time during the initial part of the solve, please submit a PR.

MAminSFV commented 2 years ago

Sure, I will try to apply the points you mentioned in the following days and let you know about the results.