Open umiswing opened 4 months ago
Does nccl provide any option to use copy engine in ncclSend and ncclRecv and leaves full SM resource for other concurrent computation to use? Just like what Transformer Engine implements in their custom communication operation.
ncclSend
ncclRecv
Does nccl provide any option to use copy engine in
ncclSend
andncclRecv
and leaves full SM resource for other concurrent computation to use? Just like what Transformer Engine implements in their custom communication operation.