A plan for a more automated Optimization.jl

ChrisRackauckas commented 1 year ago

Optimization.jl is okay, but it's not quite conforming to the standards of the SciML interface. It conflaits the "solver package" with the "interface package". In some sense, the current Optimization.jl is closer to DiffEqBase.jl than it is to DifferentialEquations.jl. The issue with that is apparent when looking at the higher level interface. Acting as an interface base, Optimization.jl does not take on the dependencies of any automatic differentiation package, nor does it take on the dependencies of any solver package. Because of that, every time a user wants to solve, they need to:

Pick an automatic differentiation choice.
Choose a solver.

But why? This isn't the point of the SciML solver conglomerate packages. This is the point of the deeper interface part, for the hardcore users. We're missing the solve(prob) for Optimization.jl, the "do it at least okay for me" that users know and love with our other packages. This is very much needed since I am seeing on the Discourse users getting confused by the glutteny of having so many optimization methods to choose and so little time to learn why a choice should be made. Good error messages are now telling people "hey this solver doesn't support constraints" and "this solver needs differentiation which you didn't define". But it doesn't go in the positive direction, i.e. "here is a good enough solver for a problem that is differentiable with box constraints". For Optimization.jl, most users would prefer that kind of hand holding.

The bigger question is, how do we get there? There's a few things to note:

The PolyOpt() method derived from sciml_train is rather good as a decent default for cases which are differentiable.
sciml_train's automated AD choice mechanism can be revived here. Try/catch. Since the objective function is called hundreds of times, sparing a few to test ADs is a perfectly fine trade-off, and it can always be ignored by choosing an AD yourself of course.
If the problem isn't AD friendly, we should defult to a derivative-free local optimizer. Something from NLopt has always done me well: I'd go to NLopt.LN_COBYLA.
We should have an alg_hint for :global (change to enums first!!!) that would then switch to global methods. Default to multistart optimziation if differentiable, and otherwise default to BBO.
How to do the swap is interesting. I think the simplest way to do this is to actually move the current code to a subpackage OptimizationAD.jl and Optimization.
Those enums should be in SciMLBase. Why are they here?
Should the AD supports be subpackages of their own to remove Requires?
We need Enzyme

ValentinKaisermayer commented 1 year ago

I think this needs some kind of problem traits, like islinear, isquadratic, isnonlinear, ismixedinteger, isadable in order to choose an appropriate solver.

According to this table: https://docs.sciml.ai/Optimization/stable/#Overview-of-the-Optimizers MOI would be for everything, but https://jump.dev/JuMP.jl/stable/installation/#Supported-solvers shows that it is not so easy.

odow commented 1 year ago

I think this needs some kind of problem traits, like islinear, isquadratic, isnonlinear, ismixedinteger, isadable in order to choose an appropriate solver.

One problem is that these need to be quite refined. Even something like conceptually simple like isquadratic, you need is_convex_quadratic and is_nonconvex_quadratic, but you also need it separately for the objective and the constraints (because some solvers might support only a convex quadratic objective but not quadratic constraints).

This is the main reason JuMP changed to declaring support for constraints and objectives on a function/set basis.

ChrisRackauckas commented 1 year ago

Not necessarily. They need to be quite refined for JuMP because JuMP is an interface for those solvers. For Optimization.jl, it does nonlinear optimization. It's just a solver specialization to notice it has more structure and use that to solve the system a bit more efficiently. If some structure isn't found, that's okay because it just uses a different solver. This would do better than say just slapping an equation into SciPy's optimization tooling, with the smarts to do a lot better in some cases, and it's okay if it's not as possible in some cases and needs to keep improving.

That's where I think JuMP and Optimization.jl hit really different niches. JuMP is a high level interface which allows for very efficient use of different solvers with different specializations. Optimization.jl does not have plans to hit that kind of niche: JuMP already does it well. What JuMP doesn't have is something that allows one to just give it a Julia code and say "solve it please", similar to what a lot of users expect with fmincon or SciPy's optimization module. Optim.jl does kind of fill that bill, but it doesn't make use of all of the extra toys we have available in the Julia ecosystem to bring it to its full potential. I think someone should be able to just give you a Julia code and the system can just recognize it at compile time that it's quadratic and then slam it over to JuMP as necessary. If no specialization is recognized, no big deal we fall back to doing very good autodiff and very good nonlinear optimization, better than the alternatives and better than the user would likely have done by themselves. The community that is deep into optimization will likely want more controls and to go directly to the QP form etc. etc., but that's JuMP's community with different goals. And these goals are very complementary, since this kind of infrastructure needs something like JuMP to exist in order to be possible.

Where the two somewhat overlap is in symbolic NLP support, though with the new tearing stuff giving a completely different performance profile from before, I think it's starting to show how pooling the symbolic resources of multiple domains together is a good way to go in the long run. However, even with all of that, I would caution anyone working on Optimization.jl and its ModelingToolkit.jl related components that we should make use of anything JuMP and MOI has to offer, at every step of the way, since indeed that stuff is already done pretty and well and there's no need to recreate anything that we don't have a very clear plan for some long term substantial improvement. That's the reason why for example we just leave linear programming and quadratic programming stuff alone, and should unless something very drastic changes. But weird Julia NLP things that auto-discover that the problem is sometimes simpler? That seems like a nice nugget to work on.

ValentinKaisermayer commented 1 year ago

My opinion on this:

I think one has to distinguish between JuMP and MOI. MOI is the nice interface that allows for easy exchange of solvers. It is a must have interface in order to get access to those solvers.

On the other hand JuMP is an algebraic modelling language enabling you to write Julia code that looks like your optimization problem on paper. In that sense it is similar to MTK.

However, MTK offers some nice things that JuMP does not have. E.g. component-based modelling with subsystems, namespaces, alias elimination and simplification for nonlinear systems. For linear stuff the alias elimination and simplification is done by any good solver during presolve, but nonlinear? I think that is one of the USPs of MTK. If during that process the problem turns out to be linear or quadratic it should be possible to leverage that.

JuMP on the other hand has support for very special constructs, e.g. 2. cones, SOS1/2, indicator constraints, ect. Being able to have that much control over the resulting optimization problem and in turn leverage specialized solvers for them, is JuMPs USP.

Also JuMP seems to be far more mature, as this suggests: https://discourse.julialang.org/t/ac-optimal-power-flow-in-various-nonlinear-optimization-frameworks/78486

ChrisRackauckas commented 1 year ago

Also JuMP seems to be far more mature, as this suggests: https://discourse.julialang.org/t/ac-optimal-power-flow-in-various-nonlinear-optimization-frameworks/78486

Now that nonlinear tearing and alias elimination applies to OptimizationSystem (I think it was finished last week?), it might be good to revisit some of the benchmarks that are dominated by constraints.

odow commented 1 year ago

Now that nonlinear tearing and alias elimination applies to OptimizationSystem

Is there a link? Where can I find more info?

ChrisRackauckas commented 1 year ago

No good link yet

ejmeitz commented 1 year ago

Another thought for default optimization methods is the default optimizer in scipy least_squares. It is extremely robust in my experience. There's more knobs exposed to tune the optimizer than Optimziation.jl provides at the moment, but Julia has way more algorithms. With those trade-offs in mind, I have not been able to find a better performing algorithm for my problem in Julia despite many hours of trying (and specific packages designed for it).

ChrisRackauckas commented 1 year ago

But that's only for least squares.

ejmeitz commented 1 year ago

The optimizer is not though? The loss function is just LSQ by default.

ChrisRackauckas commented 1 year ago

IIRC that uses Levenberg–Marquardt which is specialized to be least squares only?

ejmeitz commented 1 year ago

They have that one, but the default is some kind of trust region algorithm. The details are a bit beyond my understanding so that could just be a special way of solving LM type problems. There's a big paragraph in the scipy docs (below). Also the loss function just has to be a function of the residuals so its not strictly least squares despite the name.

Method ‘trf’ (Trust Region Reflective) is motivated by the process of solving a system of equations, which constitute the first-order optimality condition for a bound-constrained minimization problem as formulated in [STIR]. The algorithm iteratively solves trust-region subproblems augmented by a special diagonal quadratic term and with trust-region shape determined by the distance from the bounds and the direction of the gradient. This enhancements help to avoid making steps directly into bounds and efficiently explore the whole space of variables. To further improve convergence, the algorithm considers search directions reflected from the bounds. To obey theoretical requirements, the algorithm keeps iterates strictly feasible. With dense Jacobians trust-region subproblems are solved by an exact method very similar to the one described in [JJMore] (and implemented in MINPACK). The difference from the MINPACK implementation is that a singular value decomposition of a Jacobian matrix is done once per iteration, instead of a QR decomposition and series of Givens rotation eliminations. For large sparse Jacobians a 2-D subspace approach of solving trust-region subproblems is used [STIR], [Byrd]. The subspace is spanned by a scaled gradient and an approximate Gauss-Newton solution delivered by scipy.sparse.linalg.lsmr. When no constraints are imposed the algorithm is very similar to MINPACK and has generally comparable performance. The algorithm works quite robust in unbounded and bounded problems, thus it is chosen as a default algorithm.

SciML / Optimization.jl

A plan for a more automated Optimization.jl #397