We consider TV-like problem of the form $\min_{x \in \mathbb{R}^{p}} f(Ax - y) + g(D x)$

where $f,g$ are convex, lsc, closed, $A$ is a linear operator, $D$ is represents a finite difference operator, either in 1D or 2D We let $\ell(x) = f(Ax - y)$.

Depending on $f,g,A$ several algorithms can be implemented and below is a subset of them.

Data fidelity loss and metric on D x

Huber:

Smooth
prox-easy but $f \circ A$ is not prox-easy except if $A$ is diagonalizable in Fourier

L1:

Nonsmooth
prox-easy but $f \circ A$ is not prox-easy

L1-L2:

Only make sense for 2D
Nonsmooth
prox-easy but $f \circ A$ is not prox-easy

MSE:

Smooth
prox-easy but $f \circ A$ is not prox-easy except if $A$ is diagonalizable in Fourier

Note: Huber and L1 can be used either as a data fidelity metric, or as a way to measure the gradient ($|\cdot|^{2}$ can also be used for $g$ but then it $Γ$-converges towards the Sobovel energy).

Solvers

Direct methods

$A = \operatorname{Id}$ or orthogonal design, $D$ is whatever finite diff.
$f$ is the MSE
$g$ is the $\ell^{1}$-norm
[ ] Graph-cut based (Boykov, Veksler, and Zabih 2001) or (Kolmogorov and Zabin 2004) (note: this is the fastest way)
Alternative discretization (maybe not relevant)

Forward-Backward on the dual

$A = \operatorname{Id}$ or orthogonal design, $D$ is whatever finite diff.
$f$ is such that $f^{\star}$ is $L$-smooth with closed-form gradient.
$g$ is prox-easy.

The starting point is to consider the dual problem of the primal problem reads $\arg\min_{\theta} -l^\star(-D^{\top} \theta) - g^{\star}(\theta) .$

Variants:

[ ] Vanilla
[ ] Momentum (Heavy-ball (Polyak 1964), ADAM …)
[ ] Accelerated (Nesterov , FISTA (Beck and Teboulle 2009), …)
[ ] Extrapolation (Celer (Massias, Gramfort, and Salmon 2018))
[ ] Mirror descent (Bregman-prox) see (Ben-Tal and Nemirovski 1987) for a review (I am not a specialist)

Proximal-dual hybrid gradient

The starting point is two possible saddle point problems equivalent \eqref{eq:tv-gen}

$\max_{\theta} \min_{x} \langle \theta, D x \rangle - g^{\star}(\theta) + \ell(x) .$

if $\ell$ is prox-easy or

$\max_{\theta,\zeta}\min_{x} \langle \theta, D x \rangle + \langle \zeta, A x \rangle - g^{\star}(\theta) + f(\zeta+y)$ if $g$ is prox-easy but $\ell$ is not prox-easy.

[ ] Alternating Direction Method of Multipliers (ADMM) (Eckstein 1989) or (Gabay and Mercier 1976) a.k.a Douglas–Rachford (Lions and Mercier 1979) (there is a paper of Douglas–Rachford but harder to read in the modern language) on the dual.
[ ] Arrow, Hurwicz (not convergent in theory in general but efficient) (book of 1958)
[ ] PDHG (Esser 2009) or (Pock et al. 2010) ($1/k$)
[x] PDHG with over-relaxation a.k.a Chambolle-Pock (Chambolle and Pock 2011) ($1/k^{2}$)
[x] #14

Variants:

[ ] Acceleration of PDHG: FISTA-like for strongly convex case (Chambolle and Pock 2011)
[ ] Preconditionning of PDHG: (Chambolle and Pock 2011) for a diagonal preconditionning easy to implement
[ ] Accelerated ADMM

Beck, A., and M. Teboulle. 2009. “A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems.” Siam J. Imaging Sci. 2 (1): 183–202. https://doi.org/10.1137/080716542.

Ben-Tal, Aharon, and Arkadi Nemirovski. 1987. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications. Philadelphia, PA: Society for Industrial and Applied Mathematics.

Boykov, Y., O. Veksler, and R. Zabih. 2001. “Fast Approximate Energy Minimization via Graph Cuts.” Ieee Transactions on Pattern Analysis and Machine Intelligence 23 (11): 1222–39. https://doi.org/10.1109/34.969114.

Chambolle, Antonin, and Thomas Pock. 2011. “A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging.” J Math Imaging Vis 40 (1): 120–45. https://doi.org/10.1007/s10851-010-0251-1.

Condat, Laurent. 2013. “A Primal–Dual Splitting Method for Convex Optimization Involving Lipschitzian, Proximable and Linear Composite Terms.” J Optim Theory Appl 158 (2): 460–79. https://doi.org/10.1007/s10957-012-0245-9.

Eckstein, Jonathan. 1989. “Splitting Methods for Monotone Operators with Applications to Parallel Optimization.” Thesis, Massachusetts Institute of Technology. https://dspace.mit.edu/handle/1721.1/14356.

Esser, E. 2009. “Applications of Lagrangian-based Alternating Direction Methods and Connections to Split Bregman.” University of California, Irvine.

Gabay, Daniel, and Bertrand Mercier. 1976. “A Dual Algorithm for the Solution of Nonlinear Variational Problems via Finite Element Approximation.” Computers & Mathematics with Applications 2 (1): 17–40. https://doi.org/10.1016/0898-1221(76)90003-1.

Kolmogorov, V., and R. Zabin. 2004. “What Energy Functions Can Be Minimized via Graph Cuts?” Ieee Transactions on Pattern Analysis and Machine Intelligence 26 (2): 147–59. https://doi.org/10.1109/TPAMI.2004.1262177.

Lions, P., and B. Mercier. 1979. “Splitting Algorithms for the Sum of Two Nonlinear Operators.” Siam J. Numer. Anal. 16 (6): 964–79. https://doi.org/10.1137/0716071.

Massias, Mathurin, Alexandre Gramfort, and Joseph Salmon. 2018. “Celer: A Fast Solver for the Lasso with Dual Extrapolation.” In Proceedings of the 35th International Conference on Machine Learning, 3315–24. PMLR. https://proceedings.mlr.press/v80/massias18a.html.

Pock, Thomas, Daniel Cremers, Horst Bischof, and Antonin Chambolle. 2010. “Global Solutions of Variational Models with Convex Regularization.” Siam J. Imaging Sci. 3 (4): 1122–45. https://doi.org/10.1137/090757617.

Polyak, B. T. 1964. “Some Methods of Speeding up the Convergence of Iteration Methods.” Ussr Computational Mathematics and Mathematical Physics 4 (5): 1–17. https://doi.org/10.1016/0041-5553(64)90137-5.

Vũ, Bằng Công. 2013. “A Splitting Algorithm for Dual Monotone Inclusions Involving Cocoercive Operators.” Adv Comput Math 38 (3): 667–81. https://doi.org/10.1007/s10444-011-9254-8.

benchopt / benchmark_tv_1d

List of algorithms for analysis formulation #10

Data fidelity loss and metric on D x

Solvers

Direct methods

Forward-Backward on the dual

Proximal-dual hybrid gradient