NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.28k stars 831 forks source link

Dual 4090 bandwidth slower with PCIe #1309

Closed YZP17121579 closed 6 months ago

YZP17121579 commented 6 months ago

4090topo

nccl-03 nccl-23

As the graph above shows, the topo between GPU0 and GPU3 is SYS, and PIX between GPU2 and GPU3. I'm wondering why the bandwidth between GPU2 and GPU3 is much slower than the other?

sjeaugey commented 6 months ago

That's because GeForce cards don't support GPU Direct P2P (direct PCI-to-PCI communication). Therefore, the traffic cannot stay local to the PCI switch and has to go back to the CPU, causing a 2x increase in load on the PCI link to the CPU compared to the case where the 2 GPUs are on different sockets.