NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.11k stars 786 forks source link

How could I carefully control which NIC to use when running ring-based collective operation? #687

Open TarzanZhao opened 2 years ago

TarzanZhao commented 2 years ago

I want to run multiple broadcasts concurrently. They will all send data into one host with multiple NICs. I do not want one single NIC to be the bottleneck. So I prefer to let these broadcasts use different NICs. How could I carefully control this?

Besides, is the order for broadcast to send data decided at creation of communicator or during running on the fly? This is also important for me to configure which NIC is used.

Thanks!

sjeaugey commented 2 years ago

I'm not sure I understand the problem. Are all GPUs part of the communicator?

TarzanZhao commented 2 years ago

Each broadcast has its own communicator that involves all devices used in this broadcast.

TarzanZhao commented 2 years ago

One simple example: we have 2 hosts, each host has two devices, each device has its own NIC. Broadcast A sends data from (host0, device0) to (host1, device0) and (host1, device1), whereas Broadcast B sends data from (host0, device1) to (host1, device0) and (host1, device1). If these two broadcasts both use the first NIC corresponding to the first device in host1, then these two broadcast could not run concurrently. That will be slow. So I want to let two broadcasts use different NICs when entering host1.

sjeaugey commented 2 years ago

I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC. Could you provide the node topology using NCCL_TOPO_DUMP_FILE=system.txt?

Edit: that might work on the "source" node but on the "destination" node, it would not. One trick would be to force all communicators to use both NICs. For that, you could edit the topology to set the network speed to half of what it is.

TarzanZhao commented 2 years ago

sry, I do not understand this trick. What does "force all communicators to use both NICs" means? I mean in a single broadcast we will only have one ring and thus use only one NIC.

Besides, what does "edit the topology" mean? How could I edit this?

I am designing algorithm but have not started coding and doing experiments yet. So I do not have topology to be exported.

sjeaugey commented 2 years ago

I thought you wanted to create two communicators, each having 3 ranks, one GPU on the "source" node and the two GPUs on the "destination" node, then use ncclBroadcast. Is that right or did I misunderstand?

TarzanZhao commented 2 years ago

Yes, you exactly understand my example. My general question is how to finely control used NIC in a nccl collective operation.

sjeaugey commented 2 years ago

You can't. But we may be able to find tricks to still get good performance. Can you run the all_reduce_perf test on all 4 GPUs setting NCCL_TOPO_DUMP=system.txt and post the result here. That would tell me which strategy is the best.

TarzanZhao commented 2 years ago

Thanks! This is just an imagined example.

kimtaehoon-dev commented 1 year ago

@sjeaugey hello ! while reading this issues, I have read your comment. you say "I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC"

But.. when i test some nccl-tests it didn't work like that.. let me describe my test..

First, Test environment

- 2 GPU node (each node have 8 gpu card)
- Each node have 8 infiniband hca(connectx-6)
- nccl 2.11.4-1+cuda11.4
- 

For test, i just set SR-IOV enable and add 4 VF for 1 PF. this is lspci result

$ lspci | grep Mella
0e:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
11:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
51:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
51:00.1 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.2 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.3 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
51:00.4 Infiniband controller: Mellanox Technologies ConnectX Family mlx5Gen Virtual Function
52:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
89:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
8c:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
a7:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
a7:00.1 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
c6:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]
c9:00.0 Infiniband controller: Mellanox Technologies MT2892 Family [ConnectX-6 Dx]

and this is nvidia-smi topo -m result

    GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    mlx5_0  mlx5_1  mlx5_2  mlx5_3  mlx5_4  mlx5_5  mlx5_6  mlx5_7  mlx5_8  mlx5_9  mlx5_10mlx5_11  mlx5_12 mlx5_13 CPU Affinity    NUMA Affinity
GPU0     X  NV12    NV12    NV12    NV12    NV12    NV12    NV12    NODE    NODE    PXB PXB SYS SYS SYS SYS SYS SYS NODE    NODE    NODE    NODE    0-63,128-191    0
GPU1    NV12     X  NV12    NV12    NV12    NV12    NV12    NV12    NODE    NODE    PXB PXB SYS SYS SYS SYS SYS SYS NODE    NODE    NODE    NODE    0-63,128-191    0
GPU2    NV12    NV12     X  NV12    NV12    NV12    NV12    NV12    PXB PXB NODE    NODE    SYS SYS SYS SYS SYS SYS PXB PXB PXB PXB 0-63,128-191    0
GPU3    NV12    NV12    NV12     X  NV12    NV12    NV12    NV12    PXB PXB NODE    NODE    SYS SYS SYS SYS SYS SYS PXB PXB PXB PXB 0-63,128-191    0
GPU4    NV12    NV12    NV12    NV12     X  NV12    NV12    NV12    SYS SYS SYS SYS NODE    NODE    NODE    NODE    PXB PXB SYS SYS SYS SYS 64-127,192-254  1
GPU5    NV12    NV12    NV12    NV12    NV12     X  NV12    NV12    SYS SYS SYS SYS NODE    NODE    NODE    NODE    PXB PXB SYS SYS SYS SYS 64-127,192-254  1
GPU6    NV12    NV12    NV12    NV12    NV12    NV12     X  NV12    SYS SYS SYS SYS PXB PXB NODE    NODE    NODE    NODE    SYS SYS SYS SYS 64-127,192-254  1
GPU7    NV12    NV12    NV12    NV12    NV12    NV12    NV12     X  SYS SYS SYS SYS PXB PXB NODE    NODE    NODE    NODE    SYS SYS SYS SYS 64-127,192-254  1
mlx5_0  NODE    NODE    PXB PXB SYS SYS SYS SYS  X  PIX NODE    NODE    SYS SYS SYS SYS SYS SYS PIX PIX PIX PIX
mlx5_1  NODE    NODE    PXB PXB SYS SYS SYS SYS PIX  X  NODE    NODE    SYS SYS SYS SYS SYS SYS PIX PIX PIX PIX
mlx5_2  PXB PXB NODE    NODE    SYS SYS SYS SYS NODE    NODE     X  PXB SYS SYS SYS SYS SYS SYS NODE    NODE    NODE    NODE
mlx5_3  PXB PXB NODE    NODE    SYS SYS SYS SYS NODE    NODE    PXB  X  SYS SYS SYS SYS SYS SYS NODE    NODE    NODE    NODE
mlx5_4  SYS SYS SYS SYS NODE    NODE    PXB PXB SYS SYS SYS SYS  X  PXB NODE    NODE    NODE    NODE    SYS SYS SYS SYS
mlx5_5  SYS SYS SYS SYS NODE    NODE    PXB PXB SYS SYS SYS SYS PXB  X  NODE    NODE    NODE    NODE    SYS SYS SYS SYS
mlx5_6  SYS SYS SYS SYS NODE    NODE    NODE    NODE    SYS SYS SYS SYS NODE    NODE     X  PIX NODE    NODE    SYS SYS SYS SYS
mlx5_7  SYS SYS SYS SYS NODE    NODE    NODE    NODE    SYS SYS SYS SYS NODE    NODE    PIX  X  NODE    NODE    SYS SYS SYS SYS
mlx5_8  SYS SYS SYS SYS PXB PXB NODE    NODE    SYS SYS SYS SYS NODE    NODE    NODE    NODE     X  PXB SYS SYS SYS SYS
mlx5_9  SYS SYS SYS SYS PXB PXB NODE    NODE    SYS SYS SYS SYS NODE    NODE    NODE    NODE    PXB  X  SYS SYS SYS SYS
mlx5_10 NODE    NODE    PXB PXB SYS SYS SYS SYS PIX PIX NODE    NODE    SYS SYS SYS SYS SYS SYS  X  PIX PIX PIX
mlx5_11 NODE    NODE    PXB PXB SYS SYS SYS SYS PIX PIX NODE    NODE    SYS SYS SYS SYS SYS SYS PIX  X  PIX PIX
mlx5_12 NODE    NODE    PXB PXB SYS SYS SYS SYS PIX PIX NODE    NODE    SYS SYS SYS SYS SYS SYS PIX PIX  X  PIX
mlx5_13 NODE    NODE    PXB PXB SYS SYS SYS SYS PIX PIX NODE    NODE    SYS SYS SYS SYS SYS SYS PIX PIX PIX  X

If i run nccl-tests... (command is below)

mpirun -v -H {{ node ip }}:4,{{ node ip }}:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_10:1,mlx5_11:1,mlx5_12:1,mlx5_13:1 -x NCCL_DEBUG=INFO \
{{ nccl-tests path }}/build/all_reduce_perf -b 10G -e 20G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1

I hope 4 process on each node just use each of infiniband hca mlx5_10, 11, 12, 13...(not duplicated. e.g. process-A use mlx5_10, process-B use mlx5_11.. like this) but only mlx5_10 is used ! this is nccl standard output

# nThread 1 nGpus 1 minBytes 10737418240 maxBytes 21474836480 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 0 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 985924 on cosmos-hpc-100a45 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  1 Group  0 Pid 985925 on cosmos-hpc-100a45 device  1 [0x0a] NVIDIA A100-SXM4-80GB
#  Rank  2 Group  0 Pid 985926 on cosmos-hpc-100a45 device  2 [0x44] NVIDIA A100-SXM4-80GB
#  Rank  3 Group  0 Pid 985927 on cosmos-hpc-100a45 device  3 [0x4a] NVIDIA A100-SXM4-80GB
#  Rank  4 Group  0 Pid 3138488 on cosmos-hpc-100a55 device  0 [0x07] NVIDIA A100-SXM4-80GB
#  Rank  5 Group  0 Pid 3138489 on cosmos-hpc-100a55 device  1 [0x0a] NVIDIA A100-SXM4-80GB
#  Rank  6 Group  0 Pid 3138490 on cosmos-hpc-100a55 device  2 [0x44] NVIDIA A100-SXM4-80GB
#  Rank  7 Group  0 Pid 3138491 on cosmos-hpc-100a55 device  3 [0x4a] NVIDIA A100-SXM4-80GB
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO NET/Plugin : Plugin load returned 17 : libnccl-net.so: cannot open shared object file: No such file or directory.
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Using network IB
NCCL version 2.11.4+cuda11.4
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO Bootstrap : Using ib0:10.1.21.11<0>
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985926:985926 [2] NCCL INFO Using network IB
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985927:985927 [3] NCCL INFO Using network IB
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.21.11<0>
cosmos-hpc-100a45:985925:985925 [1] NCCL INFO Using network IB
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO Bootstrap : Using ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138488:3138488 [0] NCCL INFO Using network IB
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138489:3138489 [1] NCCL INFO Using network IB
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138490:3138490 [2] NCCL INFO Using network IB
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO NET/IB : Using [0]mlx5_10:1/IB [1]mlx5_11:1/IB [2]mlx5_12:1/IB [3]mlx5_13:1/IB ; OOB ib0:10.1.24.11<0>
cosmos-hpc-100a55:3138491:3138491 [3] NCCL INFO Using network IB
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Trees [0] -1/-1/-1->1->0 [1] -1/-1/-1->1->0
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Trees [0] 3/6/-1->2->-1 [1] 3/-1/-1->2->6
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Trees [0] 0/-1/-1->3->2 [1] 0/-1/-1->3->2
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00/02 :    0   3   6   5   4   7   2   1
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01/02 :    0   3   6   5   4   7   2   1
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Trees [0] 1/-1/-1->0->3 [1] 1/-1/-1->0->3
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Trees [0] 7/-1/-1->6->2 [1] 7/2/-1->6->-1
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Setting affinity for GPU 2 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Trees [0] 5/-1/-1->4->7 [1] 5/-1/-1->4->7
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Setting affinity for GPU 0 to ffffffff,ffffffff,00000000,00000000,ffffffff,ffffffff
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Trees [0] 4/-1/-1->7->6 [1] 4/-1/-1->7->6
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Trees [0] -1/-1/-1->5->4 [1] -1/-1/-1->5->4
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 7[4a000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 7[4a000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 3[4a000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00 : 0[7000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 3[4a000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 00 : 4[7000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01 : 0[7000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Channel 00 : 1[a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 01 : 4[7000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Channel 00 : 5[a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Channel 01 : 1[a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Channel 01 : 5[a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Connected all rings
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 00 : 0[7000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Connected all rings
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Channel 01 : 0[7000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 00 : 4[7000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Channel 01 : 4[7000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Connected all rings
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 1[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Connected all rings
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Connected all rings
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 3[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Connected all rings
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 5[a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Connected all rings
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO Connected all trees
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Connected all rings
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 7[4a000] via P2P/IPC/read
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 6[44000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 6[44000] -> 2[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 00 : 2[44000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 2[44000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Channel 01 : 2[44000] -> 6[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 0[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 2[44000] -> 6[44000] [receive] via NET/IB/0/GDRDMA
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 00 : 6[44000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO Connected all trees
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Channel 01 : 6[44000] -> 2[44000] [send] via NET/IB/0/GDRDMA
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 00 : 3[4a000] -> 2[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 4[7000] via P2P/IPC/read
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Channel 01 : 3[4a000] -> 2[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 00 : 7[4a000] -> 6[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Channel 01 : 7[4a000] -> 6[44000] via P2P/IPC/read
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO Connected all trees
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a55:3138490:3138595 [2] NCCL INFO comm 0x7fed98000fa0 rank 6 nranks 8 cudaDev 2 busId 44000 - Init COMPLETE
cosmos-hpc-100a55:3138488:3138580 [0] NCCL INFO comm 0x7f038c000fa0 rank 4 nranks 8 cudaDev 0 busId 7000 - Init COMPLETE
cosmos-hpc-100a55:3138491:3138597 [3] NCCL INFO comm 0x7fc430000fa0 rank 7 nranks 8 cudaDev 3 busId 4a000 - Init COMPLETE
cosmos-hpc-100a55:3138489:3138587 [1] NCCL INFO comm 0x7f689c000fa0 rank 5 nranks 8 cudaDev 1 busId a000 - Init COMPLETE
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO Connected all trees
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO Connected all trees
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 8/8/512
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO 2 coll channels, 2 p2p channels, 1 p2p channels per peer
cosmos-hpc-100a45:985927:986028 [3] NCCL INFO comm 0x7fde18000fa0 rank 3 nranks 8 cudaDev 3 busId 4a000 - Init COMPLETE
cosmos-hpc-100a45:985925:986027 [1] NCCL INFO comm 0x7fbb68000fa0 rank 1 nranks 8 cudaDev 1 busId a000 - Init COMPLETE
cosmos-hpc-100a45:985926:986018 [2] NCCL INFO comm 0x7f4698000fa0 rank 2 nranks 8 cudaDev 2 busId 44000 - Init COMPLETE
cosmos-hpc-100a45:985924:986013 [0] NCCL INFO comm 0x7f0410000fa0 rank 0 nranks 8 cudaDev 0 busId 7000 - Init COMPLETE
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
cosmos-hpc-100a45:985924:985924 [0] NCCL INFO Launch mode Parallel

and this is nccl graph log

NCCL version 2.11.4+cuda11.4
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Attribute coll of node net not found
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO ==========================================
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO ==========================================
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO ==========================================
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/7000 (4)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/A000 (5)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/44000 (6)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + PCI[24.0] - GPU/4A000 (7)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/0 (400ea10003fd7010/1/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/1 (400ea10003fd7010/2/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/2 (400ea10003fd7010/3/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO                             + NET[25.0] - NET/3 (400ea10003fd7010/4/25.000000)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO ==========================================
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO  0 : NET/0 GPU/6 GPU/5 GPU/4 GPU/7 NET/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO  0 : NET/0 GPU/6 GPU/7 GPU/4 GPU/5 NET/0
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO ==========================================
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO ==========================================
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO ==========================================
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO CPU/0 (1/2/-1)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                                           + NVL[264.0] - NVS/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO               + PCI[24.0] - NIC/51000
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO ==========================================
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/7000 :GPU/7000 (0/5000.000000/LOC) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (0/5000.000000/LOC) GPU/44000 (2/264.000000/NVL) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (6/24.000000/PHB) NET/1 (6/24.000000/PHB) NET/2 (6/24.000000/PHB) NET/3 (6/24.000000/PHB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/44000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (0/5000.000000/LOC) GPU/4A000 (2/264.000000/NVL) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO GPU/4A000 :GPU/7000 (2/264.000000/NVL) GPU/A000 (2/264.000000/NVL) GPU/44000 (2/264.000000/NVL) GPU/4A000 (0/5000.000000/LOC) CPU/0 (3/24.000000/PHB) NET/0 (4/24.000000/PXB) NET/1 (4/24.000000/PXB) NET/2 (4/24.000000/PXB) NET/3 (4/24.000000/PXB)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/0 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (0/5000.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/1 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (0/5000.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/2 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (0/5000.000000/LOC) NET/3 (2/25.000000/LOC)
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO NET/3 :GPU/7000 (6/24.000000/PHB) GPU/A000 (6/24.000000/PHB) GPU/44000 (4/24.000000/PXB) GPU/4A000 (4/24.000000/PXB) CPU/0 (3/24.000000/PHB) NET/0 (2/25.000000/LOC) NET/1 (2/25.000000/LOC) NET/2 (2/25.000000/LOC) NET/3 (0/5000.000000/LOC)
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 1, crossNic 0, nChannels 1, speed 48.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO  0 : NET/0 GPU/2 GPU/3 GPU/0 GPU/1 NET/0
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Pattern 3, crossNic 0, nChannels 0, speed 0.000000/0.000000, type NVL/PIX, sameChannels 1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Tree 0 : 2 -> 3 -> 0/-1/-1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Tree 1 : 2 -> 3 -> 0/-1/-1
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Ring 00 : 0 -> 3 -> 6
cosmos-hpc-100a45:995858:995947 [3] NCCL INFO Ring 01 : 0 -> 3 -> 6
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Ring 00 : 2 -> 1 -> 0
cosmos-hpc-100a45:995856:995950 [1] NCCL INFO Ring 01 : 2 -> 1 -> 0
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Tree 0 : -1 -> 2 -> 3/6/-1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Tree 1 : 6 -> 2 -> 3/-1/-1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Ring 00 : 7 -> 2 -> 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Ring 01 : 7 -> 2 -> 1
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Ring 00 : 5 -> 4 -> 7
cosmos-hpc-100a55:3149021:3149131 [0] NCCL INFO Ring 01 : 5 -> 4 -> 7
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Ring 00 : 1 -> 0 -> 3
cosmos-hpc-100a45:995855:995929 [0] NCCL INFO Ring 01 : 1 -> 0 -> 3
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Ring 00 : 6 -> 5 -> 4
cosmos-hpc-100a55:3149022:3149146 [1] NCCL INFO Ring 01 : 6 -> 5 -> 4
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Tree 0 : 2 -> 6 -> 7/-1/-1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Tree 1 : -1 -> 6 -> 7/2/-1
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Ring 00 : 3 -> 6 -> 5
cosmos-hpc-100a55:3149023:3149144 [2] NCCL INFO Ring 01 : 3 -> 6 -> 5
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Tree 0 : 6 -> 7 -> 4/-1/-1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Tree 1 : 6 -> 7 -> 4/-1/-1
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Ring 00 : 4 -> 7 -> 2
cosmos-hpc-100a55:3149024:3149142 [3] NCCL INFO Ring 01 : 4 -> 7 -> 2

Purpose of this test is... I just check that when multiple process share one infiniband device(PF) if i make VFs via SR-IOV and assign that VFs for each process, it can make more good performance result than just share a PF

thank you, i hope your answer..

sjeaugey commented 1 year ago

I'm not sure how your experiment relates to my comment. You have 2 NICs for 2 GPUs so each GPU would use a different NIC by default, e.g. GPU 0 would use mlx5_2 and GPU 1 would use mlx5_3. This has nothing to do with multiple processes or VFs. NCCL is not designed to have multiple processes share GPUs. They should be able to share a NIC though, even with PF -- but I don't have much experience with that, and in your case given you have one NIC per GPU it should not happen.

kimtaehoon-dev commented 1 year ago

I am sorry, my long question makes you confused.. Let me explain again

  1. There are 2 gpu node ( 8 gpu card, 8 infiniband hca )
  2. I have made four VF for just a PF(via SR-IOV). The PF device name is mlx50, VFs is mlx5[10:13]

And, I will run nccl-tests with this command. If I run this command, 4 process on each node run, and each process use 1 GPU(not share with other process, each process get own gpu. e.g. process-A get GPU1, process-B get GPU2...) and all processes do allreduce operation via 4 Infiniband VFs device mlx5[10:13] (not use PFs, only use VFs)

mpirun -v -H {{ node ip }}:4,{{ node ip }}:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_10:1,mlx5_11:1,mlx5_12:1,mlx5_13:1 -x NCCL_DEBUG=INFO \
{{ nccl-tests path }}/build/all_reduce_perf -b 10G -e 20G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1

I can not understand the above nccl-tests result. Because all processes use only a Net device mlx510. From your comment "I see. With a recent NCCL, if both GPUs are at the same distance of both NICs, I think each GPU would use a different NIC." I think the processes use all net device mlx5[10:13]. but they use only mlx5_10...

I try to explain my test easily based on my poor english T^T... I hope you'll get my situation.. and want some suggestion. Thank you !

sjeaugey commented 1 year ago

I think you are running a single allreduce here, so we create a single ring. Hence, only one interface will be used. If you were to run 4 concurrent allreduce operations (across GPUs 0 of each node, GPUs 1 of each node, ...) then maybe each GPU would pick a different VF.

But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there. You log indicates a single ring.

kimtaehoon-dev commented 1 year ago

I think you are running a single allreduce here, so we create a single ring. Hence, only one interface will be used. If you were to run 4 concurrent allreduce operations (across GPUs 0 of each node, GPUs 1 of each node, ...) then maybe each GPU would pick a different VF.

But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there. You log indicates a single ring.

Could you please tell me about single ring..?? and how to set to use concurrent ring algorithm..?? and... how did you got it that nccl use single ring algorithm..?? In the log above, there are logs like this "Connected all trees"... What is this mean?? (I am very very beginner of nccl.... sorry)

sjeaugey commented 1 year ago

It would use a single ring because more would not give better performance, because they all map to the same port. NCCL has a pretty advanced topology detection and figures out the GPU, PCI, NIC, ports topology -- then searches for the most optimized path between GPUs and NICs.

In the logs, I see you have only 2 channels and we use 2 channels per ring. Also, this log:

cosmos-hpc-100a45:995857:995934 [2] NCCL INFO Pattern 4, crossNic 0, nChannels 1, speed 24.000000/24.000000, type NVL/PXB, sameChannels 1
cosmos-hpc-100a45:995857:995934 [2] NCCL INFO  0 : NET/0 GPU/2 GPU/1 GPU/0 GPU/3 NET/0

shows that for pattern 4 (ring) the most optimal solution we found was 1 channel, going from NET 0 to GPU 2, 1, 0, 3 then back to NET 0. You can also see the topology NCCL detected with all 4 NETs attached to the same PCI port of 24GB/s, meaning using more than one port at 25GB/s will not give better performance:

NCCL INFO === System : maxWidth 24.0 totalWidth 264.0 ===
NCCL INFO CPU/0 (1/2/-1)
NCCL INFO + PCI[24.0] - PCI/1000 (1000c01010000000)
NCCL INFO               + PCI[24.0] - PCI/5000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/7000 (0)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - PCI/8000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/A000 (1)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO + PCI[24.0] - PCI/3E000 (1000c01010000000)
NCCL INFO               + PCI[24.0] - PCI/42000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/44000 (2)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - PCI/48000 (1000c01010de13b8)
NCCL INFO                             + PCI[24.0] - GPU/4A000 (3)
NCCL INFO                                           + NVL[264.0] - NVS/0
NCCL INFO               + PCI[24.0] - NIC/51000
NCCL INFO                             + NET[25.0] - NET/0 (c5a30003fd7010/1/25.000000)
NCCL INFO                             + NET[25.0] - NET/1 (c5a30003fd7010/2/25.000000)
NCCL INFO                             + NET[25.0] - NET/2 (c5a30003fd7010/3/25.000000)
NCCL INFO                             + NET[25.0] - NET/3 (c5a30003fd7010/4/25.000000)

Connected .... means that we connected GPUs together along the ring(s) or tree(s) that we computed.

kimtaehoon-dev commented 1 year ago

I really appreciate for your answer ! Thank you very very much, I learn lots of thing..! I get some more question..!

You say like this "But when NCCL tries to maximize the bandwidth within a node, it can see that all VFs are actually the same NIC, so it will know there is no point in using all of them because they map to the same port in the end. So once we've found a path using the first port, we know there is no bandwidth left for the other ports and we stop there."

I understand that this means if nccl detect only a infiniband device and all process(gpu) should share the device, only A process use infiniband at a time and other process wait until the process finish it's job. Am i right?? ( As i know, if multiple pure process(not based on nccl environment) share a infiniband, each process create each QP(Queue Pair) and send data concurrently. not wait until another process finish it's job. but process based on nccl is wait. am i right..??)

I run nccl-tests alltoall operation like this(mlx5_0 is infiniband HDR device, not VF, it is PF)

mpirun -v -H 10.182.60.238:4,10.182.62.4:4 -map-by slot --mca btl ^openib --mca btl_tcp_if_include bond0 -x NCCL_IB_HCA==mlx5_0 -x NCCL_DEBUG=INFO \
> /home/deploy/workspace/mosty/nccl-tests/build/alltoall_perf -b 1G -e 2G -f 2 -c 0 -n 20 -w 5 -t 1 -g 1
---
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
  1073741824      33554432     float    none      -1  1014091    1.06    0.93    N/A  1009359    1.06    0.93    N/A
  2147483648      67108864     float    none      -1  1994010    1.08    0.94    N/A  2016215    1.07    0.93    N/A
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 0.932899
#

But bandwidth is very very low. (as i know, infiniband hdr can make bandwidth up to 200Gb/s = 25GB/s) I think the reason why low bandwidth is , as what you say, other process wait until a process which use infiniband finish it's job. am i right..??

sjeaugey commented 1 year ago

I understand that this means if nccl detect only a infiniband device and all process(gpu) should share the device, only A process use infiniband at a time and other process wait until the process finish it's job. Am i right?

I'm not sure I'd agree with that. Ring Allreduce requires data to go through each GPU and enter/exit the node once. There is no point in having all GPUs communicate between nodes, we just need to do it once in each direction. In the example above, GPU 2 is receiving data and GPU 3 is sending data. Everything is pipelined; there is a constant flow of data entering the NIC going to GPU 2, then being processed by all GPUs and exiting the node. Feel free to watch my GTC talk this year (2022) for a graphical depiction of the ring algorithm and how the rings map to the hardware.

I run nccl-tests alltoall operation [...] But bandwidth is very very low.

Alltoall would have each GPU use the NIC because there is no way to fuse data (hence no ring), just direct communication, and they may actually use the different VFs (not that it makes any difference). The expected performance, given they share a NIC would be 24GB/s / 8 GPUs = 3 GB/s. 1GB/s is indeed much lower than it should be, but that likely because most GPUs have to use a remote NIC, through the CPU, and that path is slow. If you use all NICs on the system, and each GPU has a NIC local to its PCI switch, you should see 24GB/s alltoall performance.