NVIDIA / nccl-tests

NCCL Tests
BSD 3-Clause "New" or "Revised" License
775 stars 226 forks source link

Differences problems in performance data of HGX A800 single server N GPUs nccl testing #210

Open cloveryyg opened 3 months ago

cloveryyg commented 3 months ago

Problem description, HGX A800 runs a single machine N GPUs NCCL test and finds that the performance bottleneck is all in NVLink. However, there is a significant difference in performance between single machine 2 GPUs/4 GPUs/8 GPUs. What is the reason for this?

When msgSize=4G, the performance of the HGX A800 single machine 2 GPUs/4 GPUs/8 GPUs nccl test is 143GB/s, 156GB/s, and 156GB/s, respectively; When msgSize=256M, the performance of the HGX A800 single machine 2 GPUs/4 GPUs/8 GPUs nccl test is 130GB/s, 145GB/s, and 151GB/s, respectively.

As a comparison, we tested the single machine multi-GPU nccl-test data of HGX H800. When msgSize=4G, there is no difference in performance between HGX H800 single machine 2 GPUs/4 GPUs/8 GPUs tested. When msgSize=256M, there is a performance difference of 150GB/s, 157GB/s, and 160GB/s for HGX H800 testing single machine 2 GPUs/4 GPUs/8 GPUs, respectively.

For detailed testing results,please refer to the attachment. Thanks a lot! Differences problems in performance data of HGX A800 single server N GPUs nccl testing.docx