[contrib] Improve FusedAdamSWA interface and add unit tests - Githubissues

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

BSD 3-Clause "New" or "Revised" License

8.43k stars 1.4k forks source link

[contrib] Improve FusedAdamSWA interface and add unit tests #1759

Closed lirundong closed 11 months ago

lirundong commented 11 months ago

Why?

FusedAdamSWA interface was loosely typed and error-prone
The training critical path of FusedAdamSWA (i.e., its step function) could contain unnecessary GPU-host sync when grad_clip_scale is set to a non-CUDA-tensor variable
FusedAdamSWA didn't have any unit test

What?

Encapsulated FusedAdamSWA math types and internal numerical type into Python enumerations to improve type robustness and readability
Accept grad_clip_scale as either a tensor or a number, for the latter case we move it to GPU in a non-blocking manner to eliminate a GPU-host sync
Add unit test to guarentee numerical correctness and demostrate usage

lirundong commented 11 months ago

Maybe @crcrpar, would you please review? Thanks!