JuliaML / StochasticOptimization.jl

Implementations of stochastic optimization algorithms and solvers
Other
30 stars 10 forks source link

Adam optimizer #13

Open CorySimon opened 7 years ago

CorySimon commented 7 years ago

This package is really useful as learning rate updaters. I'm using a variant of the Adam scheme here for SGD.

I think it is unnecessary to have \rho_i^t as vectors. Shouldn't these be Float64's? Also, pedantic, I'm not sure why they are called \rho instead of \beta. https://github.com/JuliaML/StochasticOptimization.jl/blob/master/src/paramupdaters.jl#L123-L124

CorySimon commented 7 years ago

Also, comparing to the paper, https://arxiv.org/pdf/1412.6980.pdf the update of \theta is not correct for the Adam optimizer. Shouldn't it be:

θ[i] -= α * m[i] / (1.0 - β₁ᵗ) * sqrt(1.0 - β₂ᵗ) / (sqrt(v[i]) + ϵ * sqrt(1.0 - β₂ᵗ))

Please confirm that I am correct, and I will make a pull request. Thanks.