Closed accosmin closed 8 years ago
Use an exponential running average for parameters (like described in SGA/SIA or ADAM papers) to:
Goal: smooth out SG or AG updates
All stochastic optimizers use a running exponential average of their updates. This results in significantly less noisy evolutions.
Use an exponential running average for parameters (like described in SGA/SIA or ADAM papers) to:
Goal: smooth out SG or AG updates