Add a structured list of solvers for different problem types

mateuszbaran commented 7 months ago

We have quite a lot of solvers now so I think it would be helpful to indicate which solvers can be used for which problem. I've started working on a list that could be included in docs. Please edit as you see fit.

The following list may help you select the right solver for your problem. Note that all solvers support Riemannian constraints.

Generic nonlinear unconstrained:
- Local derivative-free: Nelder-Mead
- Local first order: Quasi-Newton, conjugate gradient descent, gradient descent
- Local second order: adaptive regularization with cubics, trust regions
- Global: particle swarm optimization
Generic nonlinear constrained:
- Local first order: augmented Lagrangian method, exact penalty method
Convex unconstrained:
- First order: subgradient method, convex bundle method
Convex constrained:
- First order: Frank-Wolfe method
Special problems:
- Nonlinear least squares: Levenberg-Marquardt (local first order)
- Proximal: cyclic proximal point, Douglas-Rachford, Chambolle-Pock, primal-dual Riemannian semismooth Newton
- Difference of convex (local first order)

kellertuer commented 7 months ago

I think the term “nonlinear” does not make much sense on manifolds. I know where this comes from – operations research for example distinguishes a lot linear and nonlinear problems (not solvers), but the optimization I come from always thinks of just a generic function f to minimise.

I am also not so much a fan of local/global distinction. This sounds like PSO is (due to global) much superior to local methods? QN on the other hand has convergence properties, PSO not (I mean it is basically a randomly moving swarm of points).

Instead of “special problems” I would prefer to put LM to first order methods, CPPA, DR, CP, PDRSN are all non-smooth solvers, that would be a nice group as well. And they are actually not far away from RCBM, just that they assume some nice proxies to exist (being efficient to compute).

Difference of Convex are actually 2 solvers (DCA, DCPPA), I would open a category for non-convex unconstrained for them.

Finally I do not like the phrase “Riemannian constraints” – mainly because I consider all these functions to work intrinsic on the manifold (without thinking of the manifold as a constraint). Also it is conjunction when you write “Riemannian constrained unconstrained solvers”?!

kellertuer commented 7 months ago

Or as an alternative classification: Maybe along the lines of “What does the user have to provide?” and secondary by “What does the objective has to fulfill?”. We could for example start with

Smooth, convex objectives

second order methods (require f, gradient and Hessian): TR, ARC
first order methods (require f, grad f)
- unconstrained: GD, CG, QN,...
- constrained (EPM, ALM)
zeroth order (Nelder Mead? Should work more like generic -> see below)

Nonsmooth, convex objectives

first order
- sub gradient based: subgrad RCBM
- proximal map based: CPPA, DR, PBM
- constrained: FW
duality based: CP, PDRN

Nonconvex

DCA, DCPPA

Generic

(just require f): PSO, NM

mateuszbaran commented 7 months ago

I think the term “nonlinear” does not make much sense on manifolds. I know where this comes from – operations research for example distinguishes a lot linear and nonlinear problems (not solvers), but the optimization I come from always thinks of just a generic function f to minimise.

Good point, we don't have any linear problems on manifolds.

I am also not so much a fan of local/global distinction. This sounds like PSO is (due to global) much superior to local methods? QN on the other hand has convergence properties, PSO not (I mean it is basically a randomly moving swarm of points).

Local/global distinction is important for multimodal (so nonconvex) objectives. I don't consider global methods to be superior to local ones, they tend to converge much slower. PSO and CMA-ES are specifically designed to be able to handle reasonably well multimodal objectives, QN just finds a local minimum, or a stationary point if we are unlucky. There is also DIRECT which performs a very thorough search for a global minimizer.

Finally I do not like the phrase “Riemannian constraints” – mainly because I consider all these functions to work intrinsic on the manifold (without thinking of the manifold as a constraint). Also it is conjunction when you write “Riemannian constrained unconstrained solvers”?!

OK, let's not use the phrase "Riemannian constraints".

Smooth, convex objectives

AFAIK most if not all of these methods can be applied to non-convex objectives?

kellertuer commented 7 months ago

AFAIK most if not all of these methods can be applied to non-convex objectives?

Let's say you can start them with that, sure, but convergence? Not so sure. So that maybe also refers to convergence criteria. Sure you can put anything into gradient descent – but what it then does is often not so clear

Local/global distinction is important for multimodal (so nonconvex) objectives. I don't consider global methods to be superior to local ones, they tend to converge much slower. PSO and CMA-ES are specifically designed to be able to handle reasonably well multimodal objectives, QN just finds a local minimum, or a stationary point if we are unlucky.

Sure. That is why I would prefer to put QN (and GD and CG) in the convex area. That is where they work (even very very) well.

mateuszbaran commented 7 months ago

Let's say you can start them with that, sure, but convergence? Not so sure. So that maybe also refers to convergence criteria. Sure you can put anything into gradient descent – but what it then does is often not so clear

Sure. That is why I would prefer to put QN (and GD and CG) in the convex area. That is where they work (even very very) well.

Local convexity is often enough for convergence to a local minimum but it's indeed unclear. I just wouldn't like to imply that those solvers can't be applied to non-convex functions, see https://link.springer.com/book/10.1007/978-3-540-85634-4 .

kellertuer commented 7 months ago

No, sure they can, just that you have less guarantees then.

kellertuer commented 7 months ago

I would like to start this but I am still unsure about the local/global thing. For me the main grouping would be by “what information is necessary” so something like

Derivative Free
Gradient
subgradient
Hessian
Splitting / Duality-based
Constrained

But this would put global/local somewhere inside these or maybe also just as a note for some solvers. Does that still sound resonable?

mateuszbaran commented 7 months ago

Sure, that sounds reasonable. Local/global distinction isn't exactly clear (for convex everything convergent is global, for nonconvex DIRECT takes a very different approach than PSO or CMA-ES. So not using this distinction on a list of solvers is fine.

kellertuer commented 7 months ago

I think local/gobal might still be worth a comment (so for the non convex global case to say this still works).

But ok, then I follow this route here further.

JuliaManifolds / Manopt.jl