NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
BSD 3-Clause "New" or "Revised" License
8.42k stars 1.4k forks source link

Add nccl_allocator for zero-copy user buffer #1796

Closed Aidyn-A closed 7 months ago

Aidyn-A commented 7 months ago

This PR adds a custom pluggable allocator that conditionally utilizes ncclMemAlloc and ncclMemFree for memory allocations. The buffers allocated through ncclMemAlloc can later be registered in NCCL Process Group for a subsequent utilization in Zero-Copy collective communications.

cc @crcrpar @eqy