NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

[Distributed optimizer] Do not monkey-patch class methods #1820

Closed timmoon10 closed 4 months ago

timmoon10 commented 4 months ago

The distributed optimizer monkey-patches its parameters so that we can register a callback hook whenever a parameter is used in the model, e.g. to perform parameter all-gathers. However, __torch_function__ has been a class method since PyTorch 1.12 (see https://github.com/pytorch/pytorch/issues/63767), so it does not make sense as a place to launch callbacks for individual parameters. Monkey-patching __torch_function__ also produces deprecation warnings, which can be annoying.