Optimizer closure - Githubissues

shuheng-liu commented 3 years ago

Add support for L-BFGS, etc.
Do batch-wise optimizer step only if closure is required. For most optimizers, the optimizer step will be performed only once after all batches are run.
Redo some of the previous changes and fix test failure.

shuheng-liu commented 3 years ago

@matinmoezzi Hi Matin, I made some minor changes such that

When using L-BFGS or any optimizer where closure is required, the gradients will be zeroed in every batch, and an optimization step is performed after every batch.
When using most other optimizers (e.g. Adam, SGD, etc.), the gradients will be zeroed (only once) before the first batch. During the training, gradients from different batches will be accumulated sequentially. After the last batch, a single optimizer step is performed. (This is the current behavior of neurodiffeq, which enables arbitrarily large training sets to be trained on a limited-memory GPU.)

If you don't have any problems, I'm about to merge it into master.

matinmoezzi commented 3 years ago

Hi Shuheng, yes, you are right. I agree with these changes. Thank you for your careful consideration.

NeuroDiffGym / neurodiffeq