Open JinraeKim opened 2 years ago
We have some stuff brewing for convex optimization in a nice way, but I wouldn't expect it to be here any time soon.
For AD of solutions to optimisation problems, check https://github.com/gdalle/ImplicitDifferentiation.jl. There is an example in the docs using Convex.jl.
Yeah for now use that kind of stuff. We do have plans for how to do some DCP stuff to automatically detect and transform Julia programs which can be conic into their convex form, in which case then we'd specialize on that, but way before that we need to worry about the simple things like detecting quadratic 😅 see https://github.com/SciML/Optimization.jl/issues/397.
Very ambitious, good luck!
I think with enough tutorials, ImplicitDifferentiation can cover almost every use case. Then you can maybe use it here by just having a thin layer over it for Optimization.jl stuff.
Ambition is never something that we lack. But most of this is clear though MTK tracing.
Then you can maybe use it here by just having a thin layer over it for Optimization.jl stuff.
I think for a version 1 we can just do that to get autodiff of Optimization.jl solves off the ground. We'll want to expand it with the whole sensealg
handling, but that can come later. Right now Optimization.jl is behind on its interface conformity (i.e. use an optimized adjoint if used within a loss function, everything else in SciML does except LinearSolve.jl), which causes other issues, so a correct conformity with no options is better than just having it not act like the other pieces.
The tricky part is customisation since every optimisation formulation is amenable to a number of "differentiable optimality conditions" and linear system solvers. Any API over ID.jl will need to implement lots of different code paths with kwargs.
amenable to a number of "differentiable optimality conditions"
e.g. KKT residual = 0, projected gradient = 0, objective + barrier gradient = 0, etc.
There is a natural choice of optimality conditions for every algorithm though so we can probably start with that. But ImplicitDifferenitiation allows mixing and matching of problem-solver-conditions-linsolver.
Also it supports both rrule and frule and with https://github.com/ThummeTo/ForwardDiffChainRules.jl you can now use it with ForwardDiff as well.
Yeah, that's how come https://docs.sciml.ai/Overview/stable/ the solver stack is a whole tab now. A optimization might have a choice of internal nonlinear solver by NonlinearSolve.jl and then a choice of linear solver by LinearSolve.jl, so then you can give types for how handle each aspect, offload some things to GPUs, etc. in an extendable way. DifferentialEquations.jl has already gotten most of the way there, so it shows that the stack is about ready to do it, but for Optimization.jl it needs a better matrix-free operator API, hence https://github.com/SciML/SciMLOperators.jl but that still needs downstream integration in order to be fully functional.
I think the best way to say it is like this. I believe that any solution that requires a user to know what ChainRules is, or how to choose between forward and reverse mode, is not a solution to the audience we want to be targeting. That plus symbolic-numeric tooling, sparsity detection, tearing, etc. gives quite a large amount of potential algorithmic advantages. However, most of those advantages go away as one starts to go further and further from the fully nonlinear space, which is why we focus on nonlinear.
Convex is interesting though since there are ways to potentially improve DCP using e-graphs.
believe that any solution that requires a user to know what ChainRules is, or how to choose between forward and reverse mode, is not a solution to the audience we want to be targeting.
I mean you don't need to know about these to use ID.jl except with ForwardDiff. We can depend on ForwardDiffChainRules and make ForwardDiff work by default but that requires a discussion with @gdalle. ForwardDiffing through an optimisation solution using the implicit function theorem is a bit niche so I am not sure it's worth taking on such a dependency, as light as it may be. @gdalle is way more dependency-conscious than I am though.
Good question, I guess it depends on the use cases that people bring to our attention
Thank you guys for the detailed explanations! @ChrisRackauckas Your plan sounds pretty ambitious but I support your plan. The reason why I loved Julia was that there were so nice ecosystems across many fields in Julia!
And also @mohamed82008, I'll take a look at ID.jl as well. Thank you!
Hi, developers!
My question is, is there any plan to support Convex.jl here with AD like cvxpylayers? For AD of the solution to (convex) optimization problems, I moved to Python and have used cvxpylayers for a while.