Closed DietDietDiet closed 3 years ago
Hi,
From my observation, there are two methods to deal with buffers, one is to process it with ema along with the parameters, and the other is to copy them directly from the model you are training. From my experience, I saw few difference between these two methods. There can be some performance difference, but I did not observer stable trend. Sometimes, implementing ema on buffers are better and sometimes the other works better, and the test gap between them is not big.
Hi, seems like the buffered_pamameters are not affected by the optimizers, namely, they remain unchanged. So I am wondering do these params need to calculated by EMA? Thanks!