Open cll24 opened 1 month ago
To the best of my knowledge, there's no way for NCCL to disable just nvlink. The granularity of control is "P2P" or "no P2P".
What does nvidia-smi topo -m
print after you use RMNvLinkEnable
? Perhaps the GPUs are simply too far from each other on the PCIe bus? NCCL will typically not attempt P2P if devices are any further from each other than PXB.
Hi, I want to test the all_reduce_perf with p2p through PCIe in H20. However, H20 is equipped with nvlink, the NCCL all_reduce_perf always transfers data with the nvlink. How Can I get the p2p with PCIe and disable the nvlink in the test.
I tried to disable the nvlink with
RMNvLinkEnable=0x0
. Then the NCCL all_reduce_perf will always leverage the SHM to communicate.