Closed leviome closed 2 years ago
ema is sometimes useful in alleviate overfitting, especially on ImageNet-1K only training for large models. It is using the exponential moving average of weights instead of the current snapshot of weights. You can check the cited paper or the code for what it does.
"Exponential Moving Average", cool! thanks a lot!
why we need ema? what is ema?