Closed de1star closed 1 year ago
The alltoall busBw bandwidth is simply computed by multiplying the AlgorithmBw by (n-1)/n, since 1/n of the data is local and (n-1)/n is remote.
If you have one GPU per node the BusBw should be the NIC BW.
On a system with both NVLink and NICs, then a portion of the traffic will be local (and should not be the bottleneck; the portion that's going through the network will determine the global time, hence the reported bandwidth.
On 2 nodes, 50% of the traffic is inter-node so you should see BusBW = 2x network bandwidth per GPU. As the number of nodes increases, it will go down to 1x the network bandwidth per GPU (general formula is N/(N-1)x the bandwidth per GPU, N being the number of nodes).
Thanks for your reply! @sjeaugey
@sjeaugey
Thanks for your explanation.
If we have M GPU per node, and each GPU is connected with one NIC, what should the theorical/ideal busbw be for single node and multi nodes?
Hi, thanks for your great help that solved my problem in another issue.
I'd like to calculate the algorithm bandwidth of all2all on my cluster, but I found https://github.com/NVIDIA/nccl-tests/blob/master/doc/PERFORMANCE.md did not mention that. May I ask how to calculate it when knowing the bandwidth and number of IBs?