Closed Yujaeseo closed 1 year ago
We usually consider 24GB/s is the peak performance per 200G NIC (12 GB/s for 100G NICs), so in your case 96GB/s is the target, which you seem to reach, so from my perspective it's perfect.
Thank you for your reply!
I ran the NCCL test(allreduce) to evaluate the performance of the GPU server. I want to know if the test result is appropriate considering the hardware specs of servers. I ran the code on the cluster which consists of 3x servers, and each server has 2x Intel Xeon Platinum 8358, 8x NVIDIA A100 SXM4 40G GPU, and 4x Mellanox HDR Infiniband cards.
I think the theoretical performance is 100GB/s (200G x 4 / 8) and the test result shows the average performance is about 84.8GB/s and peak performance reaches about 96GB/s.
Are the results reasonable considering the hardware specifications? Are there any additional optimization methods to apply?
The mpirun command and test result are as follows.
I look forward to answer! Thank you.