Open CorySimon opened 7 years ago
Also, comparing to the paper, https://arxiv.org/pdf/1412.6980.pdf the update of \theta is not correct for the Adam optimizer. Shouldn't it be:
θ[i] -= α * m[i] / (1.0 - β₁ᵗ) * sqrt(1.0 - β₂ᵗ) / (sqrt(v[i]) + ϵ * sqrt(1.0 - β₂ᵗ))
Please confirm that I am correct, and I will make a pull request. Thanks.
This package is really useful as learning rate updaters. I'm using a variant of the Adam scheme here for SGD.
I think it is unnecessary to have \rho_i^t as vectors. Shouldn't these be Float64's? Also, pedantic, I'm not sure why they are called \rho instead of \beta. https://github.com/JuliaML/StochasticOptimization.jl/blob/master/src/paramupdaters.jl#L123-L124