Closed MiyazonoKaori closed 1 month ago
NVLS was a new feature of H100. A100 GPUs do not support it.
Yes, NVLS won't be available on this platform, but NCCL should still be using regular NVLinks instead of SHM/direct... I see the following in the output:
NCCL_P2P_LEVEL set by environment to LOC
This forces P2P off. Please unset this variable and you should see a considerable speedup...
export NCCL_P2P_DISABLE=0, bandwidth has reached 220GB/s, thanks~
The host has nvlink and nvswitch, but when using nccl-tests, it displays 0 nvls channels and the bandwidth is only 10GB/s. How should I troubleshoot and repair ?