XiangLi1999 / Diffusion-LM

Diffusion-LM
Apache License 2.0
1.02k stars 133 forks source link

Training Cost due to the EMA mechanism #50

Open Lancelot39 opened 1 year ago

Lancelot39 commented 1 year ago

Thanks for such nice work and your kind released code! I have just tried it and found that the EMA mechanism has been used in your optimization of the Diffusion-LM code, which limits the update of the model parameters a lot. Indeed, such a way may stabilize the training process but also increase the training cost. I suppose once it has been removed, would the performance of Diffusion-LM degrade a lot? Or maybe the training cost could be further reduced a lot?

I am very expected to know the motivation of using EMA in your approach. Looking forward to your response.