Apex optimizer interface differs from torch.nn.Optim

ngimel commented 3 years ago

This makes it harder to interchangeably use optimizers in generic training frameworks (e.g. checkpointing/loading from checkpoint has to be different depending on the optimizer, zeroing/setting grads to None is different etc). Some examples:

oss optim's set_grad_none is in the zero_grad(), and apex's set_grad_none is in the init().
apex's optimizer state are lazily initiated which makes checkpoint loading a bit tricky
apex and oss optim has mismatch in the state, e.g. "step" can either be a tensor or int?

Is it possible to make apex/core optimizer interface more consistent? cc @ptrblck cc @xw285cornell

Kaixhin commented 3 years ago

oss optim's set_grad_none is in the zero_grad(), and apex's set_grad_none is in the init()

Fixing this in particular would allow better interoperability between CPU/CUDA/CUDA+Apex code. I'd like to use this functionality, but the added complexity is not good - in particular when there are many different updates and I would need to check if I'm using Apex to handle set_grad_none=True or not.

cyugao commented 1 year ago

I also encountered this issue. I don't think Apex optimizers need to handle set_grad_none since PyTorch optimizer base class already handles that. Changing the function signature breaks the protocol.

NVIDIA / apex

Apex optimizer interface differs from torch.nn.Optim #1014