Open szhengac opened 3 years ago
The different types of trees are with repect to how we connect intra-node ranks to the inter-node tree. The inter-node tree is always the same. Intra-node, when we have 2 GPUs close to the NIC, we can choose which GPU will send or receive, so we have different options to balance PCI traffic and/or reduction computing load.
The NCCL_TOPO_MAX_NODE
constant is the maximum number of nodes of one type in the node topology graph. So we support up to 256 GPUs (per node), 256 NICs (per node), 256 PCI switches, 256 NUMA nodes, ...
Thanks for responding. Is the double binary tree mentioned in https://developer.nvidia.com/blog/massively-scale-deep-learning-training-nccl-2-4/ equivalent to any type of the tree here?
The double binary tree concerns inter-node communication only. All three use it.
Then how we aggregate the data for the intra ranks? Another intra-node tree or ring allreduce?
Well, we also have an intra-node chain which converges to the NIC, adding a third branch to the tree (not shown in the double tree). How that chain is connected to the inter-node double tree is what makes the three variants :
Thanks. This is much clearer to me. One last question, do we have additional dummy root node (e.g., rank 0 in the first tree of https://developer.nvidia.com/blog/massively-scale-deep-learning-training-nccl-2-4/ ) in the double binary tree when #nodes is an odd number? If not, it seems to me that busbw = 1.5 * algbw
.
Hi, can anyone elaborate more on the difference between the following three Tree structures?
https://github.com/NVIDIA/nccl/blob/399656269027c1818fc999ccf8ec4dd838cec50d/src/include/graph.h#L55-L57
Also, what does the following constant stand for
https://github.com/NVIDIA/nccl/blob/399656269027c1818fc999ccf8ec4dd838cec50d/src/include/graph.h#L50
Based on the usage of
graph->intra
in other parts of the codebase, I thought it is the maximum number of GPUs. But 256 is too small, so I am confused.Thanks.