Closed shuheng-liu closed 3 years ago
@matinmoezzi Hi Matin, I made some minor changes such that
closure
is required, the gradients will be zeroed in every batch, and an optimization step is performed after every batch.Adam
, SGD
, etc.), the gradients will be zeroed (only once) before the first batch. During the training, gradients from different batches will be accumulated sequentially. After the last batch, a single optimizer step is performed. (This is the current behavior of neurodiffeq
, which enables arbitrarily large training sets to be trained on a limited-memory GPU.)If you don't have any problems, I'm about to merge it into master.
Hi Shuheng, yes, you are right. I agree with these changes. Thank you for your careful consideration.
closure
is required. For most optimizers, the optimizer step will be performed only once after all batches are run.