NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.28k stars 829 forks source link

Is there any option to use copy engine in ncclSend and ncclRecv ? #1386

Open umiswing opened 4 months ago

umiswing commented 4 months ago

Does nccl provide any option to use copy engine in ncclSend and ncclRecv and leaves full SM resource for other concurrent computation to use? Just like what Transformer Engine implements in their custom communication operation.