Formulate NOCP for deterministic trajectory optimization of centroidal momentum dynamics using SCP

System dynamics:

Centroidal momentum dynamics of legged robots with a flat rectangular are related to the external contact forces using Newton-Euler dynamics as follows:

\begin{align}\tag{1} \begin{bmatrix} \dot{\boldsymbol{l}} \ \newline \dot{\boldsymbol{h}} \ \end{bmatrix} = \begin{bmatrix} m \boldsymbol{g} + \sum_{e \in e_{cnt}} \boldsymbol{f}_{e,k} \newline \sum_{e \in e_{cnt}} (\boldsymbol{p}_{e,k} - \boldsymbol{r} + \boldsymbol{R}^{x,y}_{e,k} \boldsymbol{\zeta} _{e,k}) \times \boldsymbol{f}_{e,k} + \boldsymbol{R}^z_{e,k} \tau_{e,k} \end{bmatrix} \end{align} where $\mathbf{l}$, $\mathbf{h} \in \mathbb{R}^3$ denotes the linear and angular momentum rate of change respectively. $\boldsymbol{p}_e \in \mathbb{R}^3$ represents the contact positions, $\boldsymbol{r} \in \mathbb{R}^3$ is the center of mass (CoM) position, and $\boldsymbol{R}_e = \begin{bmatrix} \boldsymbol{R}_x & \boldsymbol{R}_y & \boldsymbol{R}_z \end{bmatrix}\in \mathbb{R}^{3 \times 3} $ is a matrix that rotates quantities from the end-effector frame to the inertial frame. The total mass of the robot is $m$, concentrated at the CoM along the gravity vector $\boldsymbol{g}$. $\zeta_e$ represents the local center of pressure (CoP) in the end-effector frame Finally, $\boldsymbol{f}_e \in \mathbb{R}^3$ represents the contact forces, and $\tau_e \in \mathbb{R}$ denotes the contact moment around the $z$-direction both in the world frame.

Problem formulation

Consider the following discrete-time OCP of the centroidal momentum dynamics: \begin{align} \&\min_{\substack{\boldsymbol{r}_k, \boldsymbol{l}_k, \boldsymbol{h}_k \ \boldsymbol{\zeta}_{e,k}, \boldsymbol{f}_{e,k}\boldsymbol{\tau}_{e,k}}} lf(N) + \sum^{N-1}\{i=0}l(\boldsymbol{x}_k, \boldsymbol{u}k) \tag{2a}\newline \& \quad\quad\text{s.t.} \nonumber \newline \& \quad \quad \begin{bmatrix} \boldsymbol{r}_{k+1} \ \newline \boldsymbol{l}_{k+1} \ \newline \boldsymbol{h}_{k+1} \end{bmatrix} = \begin{bmatrix} \boldsymbol{r}\{k} + \frac{1}{m}\boldsymbol{l}_k\Deltak \ \newline \boldsymbol{l}_{k} + m \boldsymbol{g}\Delta_{k} + \sum_{e \in e_{cnt}} \boldsymbol{f}_{e,k} \Delta_k\ \newline \boldsymbol{h}_{k} + \sum_{e \in e_{cnt}} \boldsymbol{\kappa} \Delta_k
\end{bmatrix}, \tag{2b} \newline \& \quad\quad \boldsymbol{\kappa} = (\boldsymbol{p}_{e,k} - \boldsymbol{r} + \boldsymbol{R}^{x,y}_{e,k} \boldsymbol{\zeta}_{e,k}) \times \boldsymbol{f}_{e,k} + \boldsymbol{R}^z_{e,k} \tau_{e,k}, \tag{2c}\newline \& \quad \quad \boldsymbol{\zeta}_{e,k} \in \begin{bmatrix} \boldsymbol{\zeta}_{x,y}^{\min}, & \boldsymbol{\zeta}_{x,y}^{\max} \end{bmatrix}, \tag{2d} \newline \& \quad\quad ||\mathfrak{f}^{x,y}_{e,k}||_{2} \leq \mu \mathfrak{f}^z_{e,k}, \quad \mathfrak{f}^z_{e,k} \geq 0, \tag{2e}\newline \& \quad\quad ||\boldsymbol{p}_{e,k} - \boldsymbol{r}\{e,k}||_2 \leq \mathcal{L}^{\max}[e], \tag{2f}\newline \&\quad\quad \boldsymbol{x}_0 = \boldsymbol{x}(0), \tag{2g}\newline \&\quad\quad \boldsymbol{x}f = \boldsymbol{x}(N), \tag{2h}\newline \&\quad\quad \forall k = 0,1,..,N-1. \tag{2i} \end{align} constraints (2b), represents the discrete-time nonlinear dynamics of (1), discretized using a simple Euler integration scheme. The local CoP $\zeta\{e,k}$ of each contact is constrained inside the contact support polygon with the environment. Moreover, the contact forces $\mathfrak{f}$ expressed in the end-effector frame are constrained inside the friction cone to avoid sliding motions, as well as a unilateral constraint on the z-component of the contact force to ensure positivity (2e). Constraint (2f) ensures that the CoM is kept within a reachable cone upper bounded by maximum length of the arm of contact with the environment. Finally, constraints (2g)-(2h), constitute the initial and terminal states. For now, it is assumed the contact plan is fixed, although in principle this can be optimized.

A Sequential Convex Programming (SCP) approach with $l_1$-penalty cost on the trust region constraints:

Sequential Convex Programming (SCP) solves the non-convex optimization problem by repeatedly constructing a convex subproblem—an approximation to the problem around the current iterate j. The nonlinear dynamics is linearized with a first-order Taylor expansion of around the previous state and control trajectories computed in the $j$th succession as follows: \begin{align}\tag{3} \boldsymbol{x}_{k+1} \approx f(\boldsymbol{x}^j_k, \boldsymbol{u}^j_k) + \nabla_x f(\boldsymbol{x}_k, uk)|\{\boldsymbol{x}^j_k, \boldsymbol{u}^j_k}.(\boldsymbol{x}_k-\boldsymbol{x}^j_k) + \nabla_u f(\boldsymbol{x}_k, \boldsymbol{u}k)|\{\boldsymbol{x}^j_k, \boldsymbol{u}^j_k}.(\boldsymbol{u}_k-\boldsymbol{u}^j_k). \end{align}

The problem in taking a naiive first-order linear approximation is that it can introduce two problems;

Artificial infeasibility: the problem becomes infeasible despite the original nonlinear problem is feasible. The most evident example of this arises when the problem is linearized about an unrealistically short time horizon, so that there is no feasible control input that can satisfy the prescribed dynamics and constraints
Artificial unboundedness: the solution takes steps far away from the validity of the linear model

To mitigate artificial unboudedness 2), one way is to impose a trust region box constraint around the current iterate. To avoid artificial infeasibility 1), $l_1$ penalty method is used, where each inequality constraint $g_i(\boldsymbol{x}) \leq 0$ becomes the penalty $|g_i(\boldsymbol{x})|^+ $, with $|x|^+ = \max (x, 0)$. The penalty is then multiplied by a weight $\omega$. Although, $l_1$ penalties are non-differentiable, but they are convex and efficient to solve. Moreover, it's an exact method: meaning that multiplying them by infinitely large $\omega$ during the optimization leads to numerically-stable algorithms that derive the constraint violations to zero. This is in contrast to $l_2$ penalty methods that penalizes the square error, i.e., $g_i(\boldsymbol{x}) \leq 0 \rightarrow (|g_i(\boldsymbol{x})|^+)^2$.

Trust region constraints formulation with $l_1$ penalty method:

Given the above, trust region constraint violation is formulated by appending an $l_1$ penalty cost to the cost function as

\begin{align}\tag{4}
\min. { \omega * \max.(\boldsymbol{x}, 0)}, \boldsymbol{x} = |\boldsymbol{x}-\boldsymbol{x}^j| - \boldsymbol{\Delta^j}. \end{align}

As mentioned before, the problem with the above cost is that it's non-differentiable. A simple workaround is to solve the dual problem by introducing a slack variable $t$, which mitigates the problem of artificial infeasibility. The above optimization problem read as the following LP: \begin{align} \min. \& \quad \boldsymbol{t} \tag{5a} \newline \text{s.t.} \nonumber \newline \&\quad \omega |\boldsymbol{x}-\boldsymbol{x}^j| - \boldsymbol{\Delta^j} \leq \boldsymbol{t}, \tag{5b}\newline \&\quad - \boldsymbol{t} \leq 0. \tag{5c} \end{align}

back to the original problem, the main source of non-convexity comes from the bilinear terms present in the angular momentum dynamics. SCP aims at solving a convex sub-problem iteratively by linearizing the non-convex dynamics constraints subject to trust region constraints with $l1$ penalty cost as in (). A classic outer linear approximation of the friction cone constraints is used for simplicity. Notice that the trust-region constraints are imposed only on the non-convex part of the dynamics—in this case the angular momentum dynamics. The convex sub-problem can then be formulated as the following QP: \begin{align} \&\min_{\substack{\boldsymbol{r}_k, \boldsymbol{l}_k, \boldsymbol{h}_k \ \boldsymbol{\zeta}_{e,k}, \boldsymbol{f}_{e,k}\boldsymbol{\tau}{e,k}}} lf(N) + \sum^{N-1}\{i=0}l(\boldsymbol{x}_k, \boldsymbol{u}k) + \sum^{N}_{i=0} \boldsymbol{\gamma} \tag{6a}\newline \& \quad\quad\text{s.t.} \nonumber \newline \& \quad \quad \boldsymbol{x}\{k+1} = \boldsymbol{f}(\boldsymbol{x}_k^j, \boldsymbol{u}_k^j) + \boldsymbol{A}_k(\boldsymbol{x}_k - \boldsymbol{x}_k^j) + \boldsymbol{B}_k (\boldsymbol{u}_k - \boldsymbol{u}^jk), \tag{6b} \newline \& \quad\quad \text{linear friction pyramid and CoM reachability constraints}, \tag{6c} \newline \& \quad \quad \boldsymbol{\zeta}_{e,k} \in \begin{bmatrix}\boldsymbol{\zeta}_{x,y}^{\min}, & \boldsymbol{\zeta}\{x,y}^{\max} \end{bmatrix}, \tag{6d}\newline \& \quad \quad \omega_j |\boldsymbol{x}_k-\boldsymbol{x}^j_k| - \boldsymbol{\Delta}_j \leq \boldsymbol{\gamma} \quad \gamma \geq 0, \tag{6e}\newline \&\quad\quad \boldsymbol{x}_0 = \boldsymbol{x}(0), \tag{6f}\newline \&\quad\quad \boldsymbol{x}_f = \boldsymbol{x}(N), \tag{6g}\newline \&\quad\quad \forall k = 0,1,..,N-1,\tag{6h} \end{align}

where $\boldsymbol{A}k \overset{\Delta}{=} \nabla_{\boldsymbol{x}} \boldsymbol{f}(\boldsymbol{x}, \boldsymbol{u})|\{\boldsymbol{x}^j_k, \boldsymbol{u}^j_k}$, and $\boldsymbol{B}k \overset{\Delta}{=} \nabla_{\boldsymbol{u}} \boldsymbol{f}(\boldsymbol{x}, \boldsymbol{u})|\{\boldsymbol{x}^j_k, \boldsymbol{u}^j_k}$, with $f$ being the discretrized centroidal dynamics in (2a). Thanks to the convex formulation of the $l_1$-penalty on the trust region box constraint in (), the subproblem is always convex and feasible, and with sufficiently large weight $\omega_j$ , the trust region constraint violations are derived to zero. Notice that in this variant of SCP (different from the standard robotics SCP solvers like TrajOpt), only the trust-region constraints are set as soft constraints, while the dynamics constraints are imposed as hard constraints, which I believe leads to more accurate and stable solutions if my intuition is correct. The above approach was adopted in GuSTO, and for the stochastic version.

Trust region update mechanism:

The trust region update mechanism is relatively straightforward, and is summarized in the following algorithm. Given the current state $\boldsymbol{x}^j$ and control $\boldsymbol{u}^j$ trajectories, solve the OCP in (). Check whether the error norm between the current solution and the previous solution lies inside the trust region $\Delta_j$ (). If not, then increase the penalty weight on the trust region to push the solver to satisfy the trust region soft constraint. Otherwise, if the solution lies inside the trust region, another condition is checked to evaluate the accuracy of the linearized model at the current iterate $\rho_j$. If the linearized model accuracy is too coarse (i.e. $\rho_j \geq \rho_1$), then decrease the trust region to push the solution to satisfy the trust region hard constraint. Otherwise, accept the solution and update the penalty weight and trust region radius ()-(). The stopping condition is achieved when the current solution $\boldsymbol{x}^{j+1}$ converges to the previous solution $\boldsymbol{x}^{j}$ within a small tolerance.

ahmadgazar / centroidal-MPC