SciML / Optimization.jl

Mathematical Optimization in Julia. Local, global, gradient-based and derivative-free. Linear, Quadratic, Convex, Mixed-Integer, and Nonlinear Optimization in one simple, fast, and differentiable interface.
https://docs.sciml.ai/Optimization/stable/
MIT License
720 stars 79 forks source link

Add an abstraction for chaining optimizers #78

Closed dangirsh closed 1 month ago

dangirsh commented 3 years ago

It seems like we often have code like:

result_neuralode = DiffEqFlux.sciml_train(loss_neuralode, prob_neuralode.p,
                                          ADAM(0.05), cb = callback,
                                          maxiters = 300)

result_neuralode2 = DiffEqFlux.sciml_train(loss_neuralode,
                                           result_neuralode.minimizer,
                                           LBFGS(),
                                           cb = callback,
                                           allow_f_increases = false)

Here, the result of optimizing with ADAM is further optimized via LBFGS.

As a first pass, I imagine a new OptimizerChain type and sciml_train entrypoint.

chained_optimizer = OptimizerChain([
    ADAM(0.05) => (cb=adam_callback, maxiters=300),
    LBFGS() => (cb=lbfgs_callback, allow_f_increases=false)
])
result_neuralode = DiffEqFlux.sciml_train(loss_neuralode, prob_neuralode.p, chained_optimizer)

That sciml_train method could then do (something like) a left fold over the list of optimizers + kwargs.

Does this interface seem reasonable? Is any information missing?

ChrisRackauckas commented 3 years ago

That sounds reasonable.

Vaibhavdixit02 commented 1 month ago

With remake this workflow is more flexible and now documented I don't plan on having explicit support for this but we can improve examples/docs for it if needed