Closed JuiceLemonLemon closed 2 months ago
What is your node topology, i.e. the output of nvidia-smi topo -m
?
What is your node topology, i.e. the output of
nvidia-smi topo -m
?
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 NIC4 NIC5 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV12 NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 24-31,88-95 3 N/A
GPU1 NV12 X NV12 NV12 NV12 NV12 NV12 NV12 SYS PXB SYS SYS SYS SYS 24-31,88-95 3 N/A
GPU2 NV12 NV12 X NV12 NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 8-15,72-79 1 N/A
GPU3 NV12 NV12 NV12 X NV12 NV12 NV12 NV12 PXB SYS SYS SYS SYS SYS 8-15,72-79 1 N/A
GPU4 NV12 NV12 NV12 NV12 X NV12 NV12 NV12 SYS SYS SYS SYS SYS PXB 56-63,120-127 7 N/A
GPU5 NV12 NV12 NV12 NV12 NV12 X NV12 NV12 SYS SYS SYS SYS SYS PXB 56-63,120-127 7 N/A
GPU6 NV12 NV12 NV12 NV12 NV12 NV12 X NV12 SYS SYS PXB SYS SYS SYS 40-47,104-111 5 N/A
GPU7 NV12 NV12 NV12 NV12 NV12 NV12 NV12 X SYS SYS PXB SYS SYS SYS 40-47,104-111 5 N/A
NIC0 SYS SYS PXB PXB SYS SYS SYS SYS X SYS SYS SYS SYS SYS
NIC1 PXB PXB SYS SYS SYS SYS SYS SYS SYS X SYS SYS SYS SYS
NIC2 SYS SYS SYS SYS SYS SYS PXB PXB SYS SYS X SYS SYS SYS
NIC3 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS X PIX SYS
NIC4 SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS SYS PIX X SYS
NIC5 SYS SYS SYS SYS PXB PXB SYS SYS SYS SYS SYS SYS SYS X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
NIC4: mlx5_4
NIC5: mlx5_5
Oh, maybe it's there is something wrong with our device, it's ok now after reboot.
Hello, I have a problem about bandwidth when using GPU 0, 1 and GPU 6, 7. The bandwidth is different.
export CUDA_VISIBLE_DEVICES=0,1 ./build/all_gather_perf -b 16M -e 1024M -i 16777216 -g 2 -d bfloat16
export CUDA_VISIBLE_DEVICES=6,7 ./build/all_gather_perf -b 16M -e 1024M -i 16777216 -g 2 -d bfloat16