Closed accosmin closed 7 years ago
Implement an ASGD like stochastic optimizers where instead of averaging we use a tuned momentum of the past states.
We should use very small momentum factors (e.g. in the range [1e-6, 1e-2]) to "average" a large number of iterations and thus to be close to ASGD.
Implement an ASGD like stochastic optimizers where instead of averaging we use a tuned momentum of the past states.