NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.13k stars 789 forks source link

Why choose 20.6 as Hopper GPU’s nvlink bandwith? #1397

Open polarstormx opened 1 month ago

polarstormx commented 1 month ago

Hi, I have some questions regarding the NVLink bandwidth for the Hopper GPU and would appreciate your insights. It seems that NCCL considers the single nvlink bandwidth of a Hopper GPU as 20.6. https://github.com/NVIDIA/nccl/blob/178b6b759074597777ce13438efb0e0ba625e429/src/graph/topo.h#L17

For the Hopper GPU, NCCL uses write operations to transfer data. From my analysis using Nsight, it appears that the bandwidth occupied by Request Protocol Data is 1/8 of that of User Data, and the bandwidth for Response Protocol Data is negligible. Therefore, the effective bandwidth should theoretically be about 8/9 of the NVLink bandwidth. image

I used the command "nvidia-smi nvlink -s" to query the NVLink bandwidth, but the result is not 25 as it is for the Ampere GPU. image Consequently, the theoretical effective bandwidth for the Hopper GPU should be around 26.5*8/9≈23.6. However, in actual testing, the bandwidth is indeed approximately 20.6. Which part of my understanding above is incorrect? Thank you very much for your time and assistance!

polarstormx commented 1 month ago

If this part of the content is confidential, please let me know, and I will close the issue.