CoinCheung / pytorch-loss

label-smooth, amsoftmax, partial-fc, focal-loss, triplet-loss, lovasz-softmax. Maybe useful
MIT License
2.17k stars 374 forks source link

do buffered params in EMA need to be updated? #8

Closed DietDietDiet closed 3 years ago

DietDietDiet commented 4 years ago

Hi, seems like the buffered_pamameters are not affected by the optimizers, namely, they remain unchanged. So I am wondering do these params need to calculated by EMA? Thanks!

CoinCheung commented 4 years ago

Hi,

From my observation, there are two methods to deal with buffers, one is to process it with ema along with the parameters, and the other is to copy them directly from the model you are training. From my experience, I saw few difference between these two methods. There can be some performance difference, but I did not observer stable trend. Sometimes, implementing ema on buffers are better and sometimes the other works better, and the test gap between them is not big.