NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

NCCLAllocator: Fix build failure #1818

Open Aidyn-A opened 4 months ago

Aidyn-A commented 4 months ago

This PR adds shareIpcHandle to NCCLAllocator to satisfy its base class definition CUDAAllocator which has recently changed (https://github.com/pytorch/pytorch/pull/130888).

cc @xwang233

crcrpar commented 4 months ago

Q: what would happen if I build this with pytorch prior without the linked pr merged?

Aidyn-A commented 4 months ago

Q: what would happen if I build this with pytorch prior without the linked pr merged?

It will fail because ShareableHandle struct gets defined only in that PR :unamused:

xwang233 commented 4 months ago

Q: what would happen if I build this with pytorch prior without the linked pr merged?

I think we can guard against TORCH_VERSION (or similar macro) for 2.5. Torch 2.4 will be released soon and won't have this field.