Closed shanleo2024 closed 3 months ago
(1) P2P and SHM have been disabled, hence NCCL cannot communicate between the two GPUs using intra-node code. Hence, the GPU is removed from the "intra-node" view. Which will make NCCL use the network to communicate between the two GPUs. (2) Once we've created intra-node channels, we connect the channels "inter-node" which will create the final channels.
Thank you. Do you mean when P2P and SHM have been disabled, we need to split a intra-node communication into inter-node communication by NET, so there seems two nodes and each GPU for one node. But the final XML file is somewhat misleading, as there is indeed no GPU be removed. I think the following graph.xml seems better:
<graphs version="1">
<graph id="0" pattern="4" crossnic="0" nchannels="1" speedintra="24" speedinter="24" latencyinter="0" typeintra="LOC" typeinter="PXB" samechannels="1">
<channel>
<net dev="0"/>
<gpu dev="0"/>
<gpu dev="1"/>
<net dev="0"/>
</channel>
</graph>
<graph id="1" pattern="3" crossnic="0" nchannels="1" speedintra="48" speedinter="24" latencyinter="0" typeintra="LOC" typeinter="PXB" samechannels="1">
<channel>
<net dev="0"/>
<gpu dev="0"/>
<gpu dev="1"/>
<net dev="0"/>
</channel>
</graph>
<graph id="2" pattern="3" crossnic="0" nchannels="0" speedintra="0" speedinter="0" latencyinter="0" typeintra="LOC" typeinter="LOC" samechannels="0"/>
<graph id="3" pattern="5" crossnic="0" nchannels="0" speedintra="0" speedinter="0" latencyinter="0" typeintra="LOC" typeinter="LOC" samechannels="0"/>
</graphs>
The XML graph can vary from node to node, and includes what NCCL considers to be node-local resources only. Because with NCCL_P2P_DISABLE=1 NCCL_SHM_DISABLE=1
NCCL thinks that it's running on two nodes, it's normal and expected that the GPU rank 1 from "the other" node is not included. If you pass NCCL_GRAPH_DUMP_FILE_RANK=1
, you will get the graph from "the other" node, which will include the GPU rank 1 but not GPU rank 0.
Thank you @kiskra-nvidia
I have learned a lot that I previously overlooked through your comments, this makes sense.
I have test the NCCL_GRAPH_DUMP_FILE_RANK
and it work, the answer is very helpful for me, thanks a lot.
Hi Dear deloper,
Run rccl_test with NCCL_P2P_DISABLE=1 and NCCL_SHM_DISABLE=1 on two GPUs and an IB NIC. The final graph.xml dumped is as follows:
RANK1 has been removed in ncclTopoTrimSystem as the path type between RANK0 and RANK1 is PATH_NET.
But the Channel dumped by NCCL_DEBUG=TRACE as follows:
I have two questions: (1) Why removing the RANK1 in ncclTopoTrimSystem in this test case? (2) NCCL has removed the RANK1 in ncclTopoTrimSystem, but the final channel still incudes RANK1.