Shark-NLP / DiffuSeq

[ICLR'23] DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models
MIT License
728 stars 88 forks source link

Gradient accumulation #49

Closed mainpyp closed 1 year ago

mainpyp commented 1 year ago

Hi, I have yet another question, this time regarding the training process. Why is the backward call in the for loop and not called after the microbatches are processed? This looks like gradient accumulation or what else is the purpose of the microbatches?

https://github.com/Shark-NLP/DiffuSeq/blame/9cddf4eaee82ec5930a68de377953a7b9981acc1/train_util.py#L237-273

mainpyp commented 1 year ago

Oh I just figured it out: its because to save memory on the GPU!