NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.26k stars 827 forks source link

Question on DGX A100 GPU topologies #1505

Closed YJHMITWEB closed 2 weeks ago

YJHMITWEB commented 2 weeks ago

Hi, I am trying to figure out the topology on the DGX A100 40GB node that I have access to.

First, I use nvidia-smi topo -m to check the links:

$ nvidia-smi topo -m
        GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    NIC9    NIC10   NIC11   CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      NV12    NV12    NV12    NV12    NV12    NV12    NV12    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     48-63,176-191   3               N/A
GPU1    NV12     X      NV12    NV12    NV12    NV12    NV12    NV12    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     48-63,176-191   3               N/A
GPU2    NV12    NV12     X      NV12    NV12    NV12    NV12    NV12    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     16-31,144-159   1               N/A
GPU3    NV12    NV12    NV12     X      NV12    NV12    NV12    NV12    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     16-31,144-159   1               N/A
GPU4    NV12    NV12    NV12    NV12     X      NV12    NV12    NV12    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     112-127,240-255 7               N/A
GPU5    NV12    NV12    NV12    NV12    NV12     X      NV12    NV12    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     112-127,240-255 7               N/A
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X      NV12    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     80-95,208-223   5               N/A
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     80-95,208-223   5               N/A
NIC0    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS      X      PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC1    PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     PXB      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC2    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS      X      PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC3    SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     PXB      X      SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS
NIC4    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX     SYS     SYS     SYS     SYS     SYS     SYS
NIC5    SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X      SYS     SYS     SYS     SYS     SYS     SYS
NIC6    SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PXB     SYS     SYS     SYS     SYS
NIC7    SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB      X      SYS     SYS     SYS     SYS
NIC8    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PXB     SYS     SYS
NIC9    SYS     SYS     SYS     SYS     SYS     SYS     PXB     PXB     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PXB      X      SYS     SYS
NIC10   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS      X      PIX
NIC11   SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     SYS     PIX      X 

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8
  NIC9: mlx5_9
  NIC10: mlx5_10
  NIC11: mlx5_11

As shown above, each pair of GPUs is connected via NV12, which means there are 12 NVLinks. From what I understand, this means, for GPU0, it has in total 12*(8-1)=84 NVLinks.

However, if I dump the NCCL topo file using NCCL_TOPO_DUMP_FILE=topo_file, it shows:

<system version="1">
  <cpu host_hash="0x8393fb66b8b750c5" numaid="3" affinity="00000000,00000000,ffff0000,00000000,00000000,00000000,ffff0000,00000000" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:01:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
      <pci busid="0000:03:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:05:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:07:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="0" sm="80" rank="0" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
      <pci busid="0000:0a:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:0c:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_0" dev="0" speed="200000" port="1" latency="0.000000" guid="0xa262550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
        <pci busid="0000:0d:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:0f:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="1" sm="80" rank="1" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
      <pci busid="0000:10:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:12:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_1" dev="1" speed="200000" port="1" latency="0.000000" guid="0x5262550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
      </pci>
    </pci>
  </cpu>
  <cpu host_hash="0x8393fb66b8b750c5" numaid="1" affinity="00000000,00000000,00000000,ffff0000,00000000,00000000,00000000,ffff0000" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:41:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
      <pci busid="0000:49:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:4b:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_2" dev="2" speed="200000" port="1" latency="0.000000" guid="0x16234e0003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
        <pci busid="0000:4c:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:4e:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="3" sm="80" rank="3" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
      <pci busid="0000:50:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:54:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_3" dev="3" speed="200000" port="1" latency="0.000000" guid="0x4e58550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
      </pci>
      <pci busid="0000:43:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:45:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:47:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="2" sm="80" rank="2" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
    </pci>
  </cpu>
  <cpu host_hash="0x8393fb66b8b750c5" numaid="0" affinity="00000000,00000000,00000000,0000ffff,00000000,00000000,00000000,0000ffff" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:61:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0009" link_speed="16.0 GT/s PCIe" link_width="16">
      <nic>
        <net name="mlx5_4" dev="4" speed="200000" port="1" latency="0.000000" guid="0x64774a0003a1420c" maxconn="131072" gdr="1"/>
        <net name="mlx5_5" dev="5" speed="100000" port="2" latency="0.000000" guid="0x64774a0003a1420c" maxconn="131072" gdr="1"/>
      </nic>
    </pci>
  </cpu>
  <cpu host_hash="0x8393fb66b8b750c5" numaid="7" affinity="ffff0000,00000000,00000000,00000000,ffff0000,00000000,00000000,00000000" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:81:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
      <pci busid="0000:8b:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:8d:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_6" dev="6" speed="200000" port="1" latency="0.000000" guid="0x4662550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
        <pci busid="0000:8e:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:90:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="5" sm="80" rank="5" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
      <pci busid="0000:92:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:94:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_7" dev="7" speed="200000" port="1" latency="0.000000" guid="0x2a62550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
      </pci>
      <pci busid="0000:83:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:85:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:87:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="4" sm="80" rank="4" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
    </pci>
  </cpu>
  <cpu host_hash="0x8393fb66b8b750c5" numaid="5" affinity="00000000,ffff0000,00000000,00000000,00000000,ffff0000,00000000,00000000" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:b1:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
      <pci busid="0000:b8:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:ba:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_8" dev="8" speed="200000" port="1" latency="0.000000" guid="0x5e62550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
        <pci busid="0000:bb:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:bd:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="7" sm="80" rank="7" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
      <pci busid="0000:be:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:cc:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0007" link_speed="16.0 GT/s PCIe" link_width="16">
          <nic>
            <net name="mlx5_9" dev="9" speed="200000" port="1" latency="0.000000" guid="0x5662550003a1420c" maxconn="131072" gdr="1"/>
          </nic>
        </pci>
      </pci>
      <pci busid="0000:b3:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x1000" subsystem_device="0xa096" link_speed="16.0 GT/s PCIe" link_width="16">
        <pci busid="0000:b5:00.0" class="0x060400" vendor="0x1000" device="0xc010" subsystem_vendor="0x10de" subsystem_device="0x13b8" link_speed="16.0 GT/s PCIe" link_width="16">
          <pci busid="0000:b7:00.0" class="0x030200" vendor="0x10de" device="0x20b0" subsystem_vendor="0x10de" subsystem_device="0x134f" link_speed="16.0 GT/s PCIe" link_width="16">
            <gpu dev="6" sm="80" rank="6" gdr="1">
              <nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
              <nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>
            </gpu>
          </pci>
        </pci>
      </pci>
    </pci>
  </cpu>
  <cpu host_hash="0x8393fb66b8b750c5" numaid="4" affinity="00000000,0000ffff,00000000,00000000,00000000,0000ffff,00000000,00000000" arch="x86_64" vendor="AuthenticAMD" familyid="143" modelid="49">
    <pci busid="0000:e1:00.0" class="0x020700" vendor="0x15b3" device="0x101b" subsystem_vendor="0x15b3" subsystem_device="0x0009" link_speed="16.0 GT/s PCIe" link_width="16">
      <nic>
        <net name="mlx5_10" dev="10" speed="200000" port="1" latency="0.000000" guid="0x1ed11e0003a1420c" maxconn="131072" gdr="1"/>
        <net name="mlx5_11" dev="11" speed="100000" port="2" latency="0.000000" guid="0x1ed11e0003a1420c" maxconn="131072" gdr="1"/>
      </nic>
    </pci>
  </cpu>
</system>

And I found that in fact, every GPU is connected to the same 6 targets:

<nvlink target="0000:c7:00.0" count="2" tclass="0x068000"/>
<nvlink target="0000:c4:00.0" count="2" tclass="0x068000"/>
<nvlink target="0000:c6:00.0" count="2" tclass="0x068000"/>
<nvlink target="0000:c9:00.0" count="2" tclass="0x068000"/>
<nvlink target="0000:c5:00.0" count="2" tclass="0x068000"/>
<nvlink target="0000:c8:00.0" count="2" tclass="0x068000"/>

To each target, there are 2 NVLinks, which means, for example, for GPU0, NV12 is actually shared among its path to all other 7 GPUs.

And I further checked comm->nvlsSupport after ncclNvlsInit(comm), it is 0.

I am wondering if these 8 GPUs are not directly connected to each other, and there is no NVSwitch, then what are these 6 targets shown in the dumped topology file?

sjeaugey commented 2 weeks ago

The nvidia-smi topology matrix do not indeed tell you whether the 12 NVLinks are direct connections or are going to an NVSwitch connecting all GPUs.

But the NCCL topology does show that all GPUs are connected to 6 NVswitches, which is expected given you're using a DGX A100 which is supposed to have all 8 GPUs connected through NVSwitch.

NVLS, a.k.a. NVLink SHARP is only supported on Hopper and later (H100) and is therefore disabled here on A100.

YJHMITWEB commented 2 weeks ago

Thank you @sjeaugey , this is clear!!