Implement the augmented Lagrangian method (ALM; aka "method of multipliers").
Motivation
The ALM (approximately) solves a sequence of unconstrained minimization problems on the primal variables, and updates the multipliers at the end of each minimization. Explicit Lagrange multiplier estimates help avoid the ill-conditioning that is inherent in the quadratic penalty function.
Consider a CMP with objective function $f$ and equality constraints $h=0$.
Given a positive penalty parameter sequence ${c^{k}}$, the ALM solves
$$ x^k = \arg \minx L{c^k}(x, \lambda^k) \triangleq f(x) + {\lambda^k}^{\top} h(x) + \frac{c^k}{2} ||h(x)||^2$$
with respect to $x$, and updates the Lagrange multipliers as:
$$ \lambda^{k+1} = \lambda^{k} + c^k h(x^k).$$
The main advantage of the ALM compared to the quadratic penalty method is that (under some reasonable assumptions), the algorithm can be successful without increasing the penalty parameter $c^k \rightarrow \infty$.
Bertsekas (1999) Section 4.2.1 suggests a way to handle inequality constraints.
Alternatives
Tseng and Bertsekas (1993) propose the "exponential multiplier method" (EMM), where an exponential, rather than a quadratic penalty is used. The resulting Lagrangian in EMM is twice differentiable if the objective and constraint functions are.
However, 2nd-order methods are challenging to scale to modern ML problems. Implementing EMM could be considered as an addition to the "standard" ALM, rather than a replacement.
References
Nocedal and Wright - Numerical Optimization - Section17.3
Enhancement
Implement the augmented Lagrangian method (ALM; aka "method of multipliers").
Motivation
The ALM (approximately) solves a sequence of unconstrained minimization problems on the primal variables, and updates the multipliers at the end of each minimization. Explicit Lagrange multiplier estimates help avoid the ill-conditioning that is inherent in the quadratic penalty function.
Consider a CMP with objective function $f$ and equality constraints $h=0$. Given a positive penalty parameter sequence ${c^{k}}$, the ALM solves $$ x^k = \arg \minx L{c^k}(x, \lambda^k) \triangleq f(x) + {\lambda^k}^{\top} h(x) + \frac{c^k}{2} ||h(x)||^2$$ with respect to $x$, and updates the Lagrange multipliers as: $$ \lambda^{k+1} = \lambda^{k} + c^k h(x^k).$$
The main advantage of the ALM compared to the quadratic penalty method is that (under some reasonable assumptions), the algorithm can be successful without increasing the penalty parameter $c^k \rightarrow \infty$.
Bertsekas (1999) Section 4.2.1 suggests a way to handle inequality constraints.
Alternatives
Tseng and Bertsekas (1993) propose the "exponential multiplier method" (EMM), where an exponential, rather than a quadratic penalty is used. The resulting Lagrangian in EMM is twice differentiable if the objective and constraint functions are. However, 2nd-order methods are challenging to scale to modern ML problems. Implementing EMM could be considered as an addition to the "standard" ALM, rather than a replacement.
References