Hi,
I have yet another question, this time regarding the training process.
Why is the backward call in the for loop and not called after the microbatches are processed?
This looks like gradient accumulation or what else is the purpose of the microbatches?
Hi, I have yet another question, this time regarding the training process. Why is the backward call in the for loop and not called after the microbatches are processed? This looks like gradient accumulation or what else is the purpose of the microbatches?
https://github.com/Shark-NLP/DiffuSeq/blame/9cddf4eaee82ec5930a68de377953a7b9981acc1/train_util.py#L237-273