[Distributed optimizer] Do not monkey-patch class methods

The distributed optimizer monkey-patches its parameters so that we can register a callback hook whenever a parameter is used in the model, e.g. to perform parameter all-gathers. However, __torch_function__ has been a class method since PyTorch 1.12 (see https://github.com/pytorch/pytorch/issues/63767), so it does not make sense as a place to launch callbacks for individual parameters. Monkey-patching __torch_function__ also produces deprecation warnings, which can be annoying.

NVIDIA / apex

[Distributed optimizer] Do not monkey-patch class methods #1820