Open clearsky07 opened 1 year ago
graph->intra is the list of GPUs, graph->inter is the list of NICs (NIC to enter the node, NIC to exit the node).
So basically the flow for a ring would be NIC inter[0], GPU intra[0] .. GPU intra[n-1], NIC inter[1].
I want to know why network devices are chosen in this way in nccl/src/graph/search.cc/ncclTopoGetNetDev: // Honor the net device in the graph int channel = channelId%graph->nChannels; int ngpus = comm->topo->nodes[GPU].count; int index = graph->intra[channelngpus] == rank ? 0 : 1; dev = graph->inter[(channel*2+index)%ngpus]; What's the meaning of index,graph->intra,and graph->inter?Thanks a lot.