The distributed optimizer monkey-patches its parameters so that we can register a callback hook whenever a parameter is used in the model, e.g. to perform parameter all-gathers. However, __torch_function__ has been a class method since PyTorch 1.12 (see https://github.com/pytorch/pytorch/issues/63767), so it does not make sense as a place to launch callbacks for individual parameters. Monkey-patching __torch_function__ also produces deprecation warnings, which can be annoying.
The distributed optimizer monkey-patches its parameters so that we can register a callback hook whenever a parameter is used in the model, e.g. to perform parameter all-gathers. However,
__torch_function__
has been a class method since PyTorch 1.12 (see https://github.com/pytorch/pytorch/issues/63767), so it does not make sense as a place to launch callbacks for individual parameters. Monkey-patching__torch_function__
also produces deprecation warnings, which can be annoying.