Looking at the implementation of ModelEmaV2, it seems that compared to timm the model only works on fp32 parameters? (see this line )
Does it mean that it will not work if I use AMP ?
Furthermore, another difference with timm is that the ema_model is not copied (in timm copying is done here ). I am probably missing where the model is copied, can you point it to me please? (if the model is not copied then EMA simply corresponds to momentum)
Hi,
Looking at the implementation of ModelEmaV2, it seems that compared to
timm
the model only works on fp32 parameters? (see this line ) Does it mean that it will not work if I use AMP ?Furthermore, another difference with
timm
is that the ema_model is not copied (in timm copying is done here ). I am probably missing where the model is copied, can you point it to me please? (if the model is not copied then EMA simply corresponds to momentum)