What is the plan along with Optimization.jl, JuMP.jl, Convex.jl, and AD support - Githubissues

SciML / Optimization.jl

Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.

https://docs.sciml.ai/Optimization/stable/

MIT License

728 stars 83 forks source link

What is the plan along with Optimization.jl, JuMP.jl, Convex.jl, and AD support #404

Open JinraeKim opened 2 years ago

JinraeKim commented 2 years ago

Hi, developers!

My question is, is there any plan to support Convex.jl here with AD like cvxpylayers? For AD of the solution to (convex) optimization problems, I moved to Python and have used cvxpylayers for a while.

ChrisRackauckas commented 2 years ago

We have some stuff brewing for convex optimization in a nice way, but I wouldn't expect it to be here any time soon.

mohamed82008 commented 2 years ago

For AD of solutions to optimisation problems, check https://github.com/gdalle/ImplicitDifferentiation.jl. There is an example in the docs using Convex.jl.

ChrisRackauckas commented 2 years ago

Yeah for now use that kind of stuff. We do have plans for how to do some DCP stuff to automatically detect and transform Julia programs which can be conic into their convex form, in which case then we'd specialize on that, but way before that we need to worry about the simple things like detecting quadratic 😅 see https://github.com/SciML/Optimization.jl/issues/397.

mohamed82008 commented 2 years ago

Very ambitious, good luck!

mohamed82008 commented 2 years ago

I think with enough tutorials, ImplicitDifferentiation can cover almost every use case. Then you can maybe use it here by just having a thin layer over it for Optimization.jl stuff.

ChrisRackauckas commented 2 years ago

Ambition is never something that we lack. But most of this is clear though MTK tracing.

Then you can maybe use it here by just having a thin layer over it for Optimization.jl stuff.

I think for a version 1 we can just do that to get autodiff of Optimization.jl solves off the ground. We'll want to expand it with the whole sensealg handling, but that can come later. Right now Optimization.jl is behind on its interface conformity (i.e. use an optimized adjoint if used within a loss function, everything else in SciML does except LinearSolve.jl), which causes other issues, so a correct conformity with no options is better than just having it not act like the other pieces.

mohamed82008 commented 2 years ago

The tricky part is customisation since every optimisation formulation is amenable to a number of "differentiable optimality conditions" and linear system solvers. Any API over ID.jl will need to implement lots of different code paths with kwargs.

mohamed82008 commented 2 years ago

amenable to a number of "differentiable optimality conditions"

e.g. KKT residual = 0, projected gradient = 0, objective + barrier gradient = 0, etc.

mohamed82008 commented 2 years ago

There is a natural choice of optimality conditions for every algorithm though so we can probably start with that. But ImplicitDifferenitiation allows mixing and matching of problem-solver-conditions-linsolver.

mohamed82008 commented 2 years ago

Also it supports both rrule and frule and with https://github.com/ThummeTo/ForwardDiffChainRules.jl you can now use it with ForwardDiff as well.

ChrisRackauckas commented 2 years ago

Yeah, that's how come https://docs.sciml.ai/Overview/stable/ the solver stack is a whole tab now. A optimization might have a choice of internal nonlinear solver by NonlinearSolve.jl and then a choice of linear solver by LinearSolve.jl, so then you can give types for how handle each aspect, offload some things to GPUs, etc. in an extendable way. DifferentialEquations.jl has already gotten most of the way there, so it shows that the stack is about ready to do it, but for Optimization.jl it needs a better matrix-free operator API, hence https://github.com/SciML/SciMLOperators.jl but that still needs downstream integration in order to be fully functional.

I think the best way to say it is like this. I believe that any solution that requires a user to know what ChainRules is, or how to choose between forward and reverse mode, is not a solution to the audience we want to be targeting. That plus symbolic-numeric tooling, sparsity detection, tearing, etc. gives quite a large amount of potential algorithmic advantages. However, most of those advantages go away as one starts to go further and further from the fully nonlinear space, which is why we focus on nonlinear.

Convex is interesting though since there are ways to potentially improve DCP using e-graphs.

mohamed82008 commented 2 years ago

believe that any solution that requires a user to know what ChainRules is, or how to choose between forward and reverse mode, is not a solution to the audience we want to be targeting.

I mean you don't need to know about these to use ID.jl except with ForwardDiff. We can depend on ForwardDiffChainRules and make ForwardDiff work by default but that requires a discussion with @gdalle. ForwardDiffing through an optimisation solution using the implicit function theorem is a bit niche so I am not sure it's worth taking on such a dependency, as light as it may be. @gdalle is way more dependency-conscious than I am though.

gdalle commented 2 years ago

Good question, I guess it depends on the use cases that people bring to our attention

JinraeKim commented 2 years ago

Thank you guys for the detailed explanations! @ChrisRackauckas Your plan sounds pretty ambitious but I support your plan. The reason why I loved Julia was that there were so nice ecosystems across many fields in Julia!

And also @mohamed82008, I'll take a look at ID.jl as well. Thank you!