google-deepmind / mujoco_mpc

Real-time behaviour synthesis with MuJoCo, using Predictive Control
https://github.com/deepmind/mujoco_mpc
Apache License 2.0
941 stars 142 forks source link

Confusion regarding how cost derivatives are computed #333

Closed DMackRus closed 1 month ago

DMackRus commented 1 month ago

Hello, I have a question regarding how cost derivatives are computed in this repository, I have read the paper as well as the overview document and I understand the mathematical theory about how the residuals are used with a normal function and then a scalar weight to compute costs.

You detail in the paper how the cost derivatives with respect to the state and control vector are computed, the first and second order derivative of the norm functions are done analytically, and the first order residual derivatives are computed via finite-differencing. Second order derivatives of the residuals are computed using the Gauss-newton approximation, the theory behind all this makes sense.

I have two questions;

The first question is regarding the use of the $C$ and $D$ matrices as the $r_x$ and $r_u$ matrices.

This line in the iLQG planner makes a function call to compute the cost derivatives, and it passes in the $C$ and $D$ matrices in which then become the variables $r_x$ and $r_u$, this implies to me that the $C$ and $D$ matrices are already the residual derivatives, however from my understanding, the $C$ and $D$ matrices are how the sensor outputs change with respect to the state and control vector.

I know the sensors are used to compute the residuals, but I didnt think they are exactly the same, i.e they don't necessarily have the same dimension, as one residual can be a combination of multiple sensor values? So I would have though that to compute $r_x$ the chain rule would be needed to convert $C$ to $r_x$ by multiplying by how the residuals change with respect to the sensors. I.e:

$rx = r{sensor} * C$

Secondly, assuming I am missing something with regards to the first point, is there not an issue with how the $C$ and $D$ matrices is computed, when you compute the $A$ and $B$ matrices, it is required to integrate the simulator as we are evaluating the system dynamics, but the $C$ and $D$ matrices are only meant to tell us how the outputs of the system change with respect to the state and control vector( at least I thought so). i.e $C_t = \delta y_t / \delta x_t$ and not $Ct = \delta y{t+1} / \delta x_t$. When I look into mjd_transitionFD it appears that the system is integrated when computing the sensor outputs as well?

Any help understanding these points would be greatly appreciated!

thowell commented 1 month ago

Great questions!

The MuJoCo sensor array is populated by standard sensors (eg, framepos) and user sensors that are defined by the user via a callback. For MJPC, we leverage custom sensors to implement a residual vector that is utilized by the cost function.

A call to mj_step processes the current state $x_t$ and action $ut$ and returns the next state $x{t+1}$ and sensor values $y_t$:

$$x_{t+1}, y_t \leftarrow \text{step}(x_t, u_t)$$

The sensor values are computed prior to integration and are a function of the current state $x_t$ and action $u_t$, see engine_forward.c.

In the MJPC task xml we require that the custom sensors for the residual are specified first, before any measurement sensors (eg, cartpole). As a result, we consider the measurements:

$$y= (r, m) \in \mathbf{R}^{n_r + n_m}$$

comprising the residual $r$ and measurements $m$. As a result of this specification, the residual Jacobians ($r_x$, $r_u$) are the first rows of the sensor Jacobians:

$$C_t = \partial y_t / \partial x_t = [r_x; m_x]\in \mathbf{R}^{(n_r + n_m) \times n_x}$$

$$D_t = \partial y_t / \partial u_t = [r_u; m_u] \in \mathbf{R}^{(n_r + n_m) \times n_u}$$

DMackRus commented 1 month ago

Hi Taylor,

Thank you so much for your reply, that has really helped clear things up to me and I now understand how that part of the code works!

So to summarize and make sure I understand, user sensors must be specified first in the xml, these user sensors need to be specified by the user as to how to be calculated, which is done via the residuals function? This residuals function is then installed as the callback for mjcb_sensor()?

I do have one quick followup which I wonder if you have any thoughts about, its related to the derivative interpolation that we discussed and you implemented a while back. When I was experimenting with derivative interpolation a while back I only interpolated the $A$ and $B$ matrices, I then specified the cost derivatives analytically for tasks and computed the cost derivatives exactly over the entire trajectory. Would you think a similar setup would be advisable when using this more flexible method of computing cost derivatives, i.e calculate all the $C$ and $D$ matrices over the entire trajectory horizon, but only some of the $A$ and $B$ matrices, since in theory the $C$ and $D$ matrices do not require expensive system dynamics integration? However I am not sure how this would work exactly with MuJoCo's computation pipeline? I am not necessarily suggesting this for MJPC, more a question for my own investigation?

thowell commented 1 month ago

So to summarize and make sure I understand, user sensors must be specified first in the xml, these user sensors need to be specified by the user as to how to be calculated, which is done via the residuals function? This residuals function is then installed as the callback for mjcb_sensor()?

Correct, the user sensors must be specified first and contiguously in the task xml file. The user sensors values are computed by the residual function. MJPC then calls the residual function in the mjcb_sensor callback here.

i.e calculate all the C and D matrices over the entire trajectory horizon, but only some of the and matrices,

For many setups computing C and D should be less expensive than computing A and B so it seems reasonable to compute C and D for the entire planning horizon. Unlike computing A and B which requires integration, computing C and D only requires calling mj_forward to compute the sensor values. One possible solution would be to create a new function that modifies mjd_stepFD to call mj_forwardSkip instead of mj_stepSkip and then optionally perform integration if A and B need to be computed.

DMackRus commented 1 month ago

Hi Taylor,

thanks for the replies! You can mark this issue as complete now!