SPSA and MGD (model gradient descent) methods

jlbosse commented 3 years ago

Hey everyone, I recently implemented the simulatenous stochastic perturbation algorthm (SPSA) and the model gradient descent (MDG, first developed in https://arxiv.org/abs/2005.11011) in a roughly Optim.jl compatible way. Is there interest in adding these methods to Optim.jl? If yes, I would make my code fully compatible to the Optim.jl interface and open a PR

pkofod commented 3 years ago

Can you describe the models and optimizers briefly?

jlbosse commented 3 years ago

Both are gradient free methods for the optimization of noisy objective function. On a high level both work by estimating the gradient from noisy function evaluation and then doing gradient descent with that gradient estimate.

SPSA works by randomly choosing a perturbation vector Δx and then evaluation y_+ = f(x + Δx), y_- = f(x-Δx) and then estimating the gradient at x as grad = (y_+ - y_-) / |Δx|^2. SPSA is already implemented in BlackBoxOptim.jl, but the implementation there doesn't follow Optim.jl's interface.

MDG estimates by randomly choosing points x_i in a ball around x and then getting (noisy) function evaluations y_i = f(x_i) at these points. A model function (typically a quadratic function) is fitted to this data, and all previous data in this ball, and then used to estimate the gradient from this surrogate model

ChrisRackauckas commented 3 years ago

MDG estimates by randomly choosing points x_i in a ball around x and then getting (noisy) function evaluations y_i = f(x_i) at these points. A model function (typically a quadratic function) is fitted to this data, and all previous data in this ball, and then used to estimate the gradient from this surrogate model

https://github.com/SciML/Surrogates.jl https://github.com/MrUrq/SurrogateModelOptim.jl

JuliaNLSolvers / Optim.jl

SPSA and MGD (model gradient descent) methods #918