Open ngimel opened 3 years ago
oss optim's set_grad_none is in the zero_grad(), and apex's set_grad_none is in the init()
Fixing this in particular would allow better interoperability between CPU/CUDA/CUDA+Apex code. I'd like to use this functionality, but the added complexity is not good - in particular when there are many different updates and I would need to check if I'm using Apex to handle set_grad_none=True
or not.
I also encountered this issue. I don't think Apex optimizers need to handle set_grad_none
since PyTorch optimizer base class already handles that. Changing the function signature breaks the protocol.
This makes it harder to interchangeably use optimizers in generic training frameworks (e.g. checkpointing/loading from checkpoint has to be different depending on the optimizer, zeroing/setting grads to None is different etc). Some examples:
Is it possible to make apex/core optimizer interface more consistent? cc @ptrblck cc @xw285cornell