Explanation of the DDP optimization

alexbuyval commented 5 years ago

Hi,

Are there any explanation (paper, post or something else) of using DDP optimization after the MPPI control iteration?

I would be very appreciated for any comments about the advantages and disadvantages of using DDP.

gradyrw commented 5 years ago

The MPPI optimization runs at 50Hz, but the control publisher is designed to queue off of the state estimator, and so it runs at the same rate as the state estimator (200Hz). To publish controls at 200Hz when the solution is only found at 50Hz, by default, there is some interpolation done between the first and second control input in the computed control sequence. However, since we're queuing off the state estimate and have both a desired control sequence and desired state sequence, we can do slightly better. What we do is compare the interpolated desired state with the actual state that was just received, which gives us an error term. Then, instead of just applying the feedforward control found by interpolating the control sequence, we apply the feedforward term plus the feedback gains times the error term. In order to find the feedback gains, we linearize around the computed state and control sequences and then use LQR (to compute the LQR gains we use DDP code, since LQR is a special case of DDP). This seems to help improve performance slightly versus just running the feedforward controls, but we haven't done any systematic study. Possible disadvantages are increased chatter in the controls if there's a lot of noise in the IMU (this can be an issue in Gazebo especially).

alexbuyval commented 5 years ago

@gradyrw Thank you very much for so fast and comprehensive answer!

AutoRally / autorally

Explanation of the DDP optimization #80