NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.34k stars 1.39k forks source link

Apex optimizer interface differs from torch.nn.Optim #1014

Open ngimel opened 3 years ago

ngimel commented 3 years ago

This makes it harder to interchangeably use optimizers in generic training frameworks (e.g. checkpointing/loading from checkpoint has to be different depending on the optimizer, zeroing/setting grads to None is different etc). Some examples:

Is it possible to make apex/core optimizer interface more consistent? cc @ptrblck cc @xw285cornell

Kaixhin commented 3 years ago

oss optim's set_grad_none is in the zero_grad(), and apex's set_grad_none is in the init()

Fixing this in particular would allow better interoperability between CPU/CUDA/CUDA+Apex code. I'd like to use this functionality, but the added complexity is not good - in particular when there are many different updates and I would need to check if I'm using Apex to handle set_grad_none=True or not.

cyugao commented 1 year ago

I also encountered this issue. I don't think Apex optimizers need to handle set_grad_none since PyTorch optimizer base class already handles that. Changing the function signature breaks the protocol.