NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.15k stars 794 forks source link

H800, NVLS is not support? #892

Open zhuhong opened 1 year ago

zhuhong commented 1 year ago

i compile the code from source , v1.8.3 the command is : NCCL_DEBUG=INFO NCCL_ALGO=Tree,Ring,,CollnetDirect,CollnetChain,NVLS ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8 and the output is :

NCCL INFO NVLS multicast support is not available on dev xxx

full output is attached.


# nThread 1 nGpus 8 minBytes 8 maxBytes 134217728 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid  33347 on 036260b61d3a device  0 [0x0f] NVIDIA H800
#  Rank  1 Group  0 Pid  33347 on 036260b61d3a device  1 [0x34] NVIDIA H800
#  Rank  2 Group  0 Pid  33347 on 036260b61d3a device  2 [0x48] NVIDIA H800
#  Rank  3 Group  0 Pid  33347 on 036260b61d3a device  3 [0x5a] NVIDIA H800
#  Rank  4 Group  0 Pid  33347 on 036260b61d3a device  4 [0x87] NVIDIA H800
#  Rank  5 Group  0 Pid  33347 on 036260b61d3a device  5 [0xae] NVIDIA H800
#  Rank  6 Group  0 Pid  33347 on 036260b61d3a device  6 [0xc2] NVIDIA H800
#  Rank  7 Group  0 Pid  33347 on 036260b61d3a device  7 [0xd7] NVIDIA H800
036260b61d3a:33347:33347 [0] NCCL INFO Bootstrap : Using eth0:172.17.0.2<0>
036260b61d3a:33347:33347 [0] NCCL INFO NET/Plugin: Failed to find ncclNetPlugin_v6 symbol.
036260b61d3a:33347:33347 [0] NCCL INFO NET/Plugin: Loaded net plugin NCCL RDMA Plugin (v5)
036260b61d3a:33347:33347 [0] NCCL INFO NET/Plugin: Failed to find ncclCollNetPlugin_v6 symbol.
036260b61d3a:33347:33347 [0] NCCL INFO NET/Plugin: Loaded coll plugin SHARP (v5)
036260b61d3a:33347:33347 [7] NCCL INFO cudaDriverVersion 12010
NCCL version 2.18.3+cuda12.1
036260b61d3a:33347:33362 [2] NCCL INFO Plugin Path : /opt/hpcx/nccl_rdma_sharp_plugin/lib/libnccl-net.so
036260b61d3a:33347:33362 [2] NCCL INFO P2P plugin IBext
036260b61d3a:33347:33362 [2] NCCL INFO NET/IB : No device found.
036260b61d3a:33347:33362 [2] NCCL INFO NET/IB : No device found.
036260b61d3a:33347:33362 [2] NCCL INFO NET/Socket : Using [0]eth0:172.17.0.2<0>
036260b61d3a:33347:33362 [2] NCCL INFO Using network Socket
036260b61d3a:33347:33366 [6] NCCL INFO Using network Socket
036260b61d3a:33347:33367 [7] NCCL INFO Using network Socket
036260b61d3a:33347:33361 [1] NCCL INFO Using network Socket
036260b61d3a:33347:33363 [3] NCCL INFO Using network Socket
036260b61d3a:33347:33360 [0] NCCL INFO Using network Socket
036260b61d3a:33347:33364 [4] NCCL INFO Using network Socket
036260b61d3a:33347:33365 [5] NCCL INFO Using network Socket
036260b61d3a:33347:33363 [3] NCCL INFO comm 0x563f5c8d0140 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 5a000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33360 [0] NCCL INFO comm 0x563f5c8bb420 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId f000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33367 [7] NCCL INFO comm 0x563f5c8e74c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d7000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33364 [4] NCCL INFO comm 0x563f5c8d5e20 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 87000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33366 [6] NCCL INFO comm 0x563f5c8e17e0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c2000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33361 [1] NCCL INFO comm 0x563f5c8c4710 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 34000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33365 [5] NCCL INFO comm 0x563f5c8dbb00 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId ae000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33362 [2] NCCL INFO comm 0x563f5c8ca460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 48000 commId 0x151891d39ece6192 - Init START
036260b61d3a:33347:33365 [5] NCCL INFO Setting affinity for GPU 5 to fffffff0,00000000,00000000,0000ffff,fff00000,00000000,00000000
036260b61d3a:33347:33365 [5] NCCL INFO NVLS multicast support is not available on dev 5
036260b61d3a:33347:33363 [3] NCCL INFO Setting affinity for GPU 3 to 0fff,ffff0000,00000000,00000000,0fffffff
036260b61d3a:33347:33363 [3] NCCL INFO NVLS multicast support is not available on dev 3
036260b61d3a:33347:33362 [2] NCCL INFO Setting affinity for GPU 2 to ff,fffff000,00000000,00000000,00ffffff,f0000000
036260b61d3a:33347:33362 [2] NCCL INFO NVLS multicast support is not available on dev 2
036260b61d3a:33347:33364 [4] NCCL INFO Setting affinity for GPU 4 to 0f,ffffff00,00000000,00000000,000fffff,ff000000,00000000
036260b61d3a:33347:33364 [4] NCCL INFO NVLS multicast support is not available on dev 4
036260b61d3a:33347:33366 [6] NCCL INFO Setting affinity for GPU 6 to fffffff0,00000000,00000000,0000ffff,fff00000,00000000,00000000
036260b61d3a:33347:33366 [6] NCCL INFO NVLS multicast support is not available on dev 6
036260b61d3a:33347:33360 [0] NCCL INFO Setting affinity for GPU 0 to 0fff,ffff0000,00000000,00000000,0fffffff
036260b61d3a:33347:33360 [0] NCCL INFO NVLS multicast support is not available on dev 0
036260b61d3a:33347:33361 [1] NCCL INFO Setting affinity for GPU 1 to ff,fffff000,00000000,00000000,00ffffff,f0000000
036260b61d3a:33347:33361 [1] NCCL INFO NVLS multicast support is not available on dev 1
036260b61d3a:33347:33367 [7] NCCL INFO Setting affinity for GPU 7 to 0f,ffffff00,00000000,00000000,000fffff,ff000000,00000000
036260b61d3a:33347:33367 [7] NCCL INFO NVLS multicast support is not available on dev 7
036260b61d3a:33347:33367 [7] NCCL INFO Trees [0] 5/-1/-1->7->4 [1] 5/-1/-1->7->4 [2] 5/-1/-1->7->4 [3] 5/-1/-1->7->4 [4] 5/-1/-1->7->4 [5] 5/-1/-1->7->4 [6] 5/-1/-1->7->4 [7] 5/-1/-1->7->4 [8] 5/-1/-1->7->4 [9] 5/-1/-1->7->4 [10] 5/-1/-1->7->4 [11] 5/-1/-1->7->4 [12] 5/-1/-1->7->4 [13] 5/-1/-1->7->4 [14] 5/-1/-1->7->4 [15] 5/-1/-1->7->4
036260b61d3a:33347:33366 [6] NCCL INFO Trees [0] -1/-1/-1->6->5 [1] -1/-1/-1->6->5 [2] -1/-1/-1->6->5 [3] -1/-1/-1->6->5 [4] -1/-1/-1->6->5 [5] -1/-1/-1->6->5 [6] -1/-1/-1->6->5 [7] -1/-1/-1->6->5 [8] -1/-1/-1->6->5 [9] -1/-1/-1->6->5 [10] -1/-1/-1->6->5 [11] -1/-1/-1->6->5 [12] -1/-1/-1->6->5 [13] -1/-1/-1->6->5 [14] -1/-1/-1->6->5 [15] -1/-1/-1->6->5
036260b61d3a:33347:33365 [5] NCCL INFO Trees [0] 6/-1/-1->5->7 [1] 6/-1/-1->5->7 [2] 6/-1/-1->5->7 [3] 6/-1/-1->5->7 [4] 6/-1/-1->5->7 [5] 6/-1/-1->5->7 [6] 6/-1/-1->5->7 [7] 6/-1/-1->5->7 [8] 6/-1/-1->5->7 [9] 6/-1/-1->5->7 [10] 6/-1/-1->5->7 [11] 6/-1/-1->5->7 [12] 6/-1/-1->5->7 [13] 6/-1/-1->5->7 [14] 6/-1/-1->5->7 [15] 6/-1/-1->5->7
036260b61d3a:33347:33365 [5] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33367 [7] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33364 [4] NCCL INFO Trees [0] 7/-1/-1->4->2 [1] 7/-1/-1->4->2 [2] 7/-1/-1->4->2 [3] 7/-1/-1->4->2 [4] 7/-1/-1->4->2 [5] 7/-1/-1->4->2 [6] 7/-1/-1->4->2 [7] 7/-1/-1->4->2 [8] 7/-1/-1->4->2 [9] 7/-1/-1->4->2 [10] 7/-1/-1->4->2 [11] 7/-1/-1->4->2 [12] 7/-1/-1->4->2 [13] 7/-1/-1->4->2 [14] 7/-1/-1->4->2 [15] 7/-1/-1->4->2
036260b61d3a:33347:33364 [4] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33366 [6] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33360 [0] NCCL INFO Channel 00/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 01/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 02/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33362 [2] NCCL INFO Trees [0] 4/-1/-1->2->1 [1] 4/-1/-1->2->1 [2] 4/-1/-1->2->1 [3] 4/-1/-1->2->1 [4] 4/-1/-1->2->1 [5] 4/-1/-1->2->1 [6] 4/-1/-1->2->1 [7] 4/-1/-1->2->1 [8] 4/-1/-1->2->1 [9] 4/-1/-1->2->1 [10] 4/-1/-1->2->1 [11] 4/-1/-1->2->1 [12] 4/-1/-1->2->1 [13] 4/-1/-1->2->1 [14] 4/-1/-1->2->1 [15] 4/-1/-1->2->1
036260b61d3a:33347:33361 [1] NCCL INFO Trees [0] 2/-1/-1->1->3 [1] 2/-1/-1->1->3 [2] 2/-1/-1->1->3 [3] 2/-1/-1->1->3 [4] 2/-1/-1->1->3 [5] 2/-1/-1->1->3 [6] 2/-1/-1->1->3 [7] 2/-1/-1->1->3 [8] 2/-1/-1->1->3 [9] 2/-1/-1->1->3 [10] 2/-1/-1->1->3 [11] 2/-1/-1->1->3 [12] 2/-1/-1->1->3 [13] 2/-1/-1->1->3 [14] 2/-1/-1->1->3 [15] 2/-1/-1->1->3
036260b61d3a:33347:33360 [0] NCCL INFO Channel 03/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 04/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 05/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 06/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 07/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 08/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 09/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 10/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 11/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 12/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33362 [2] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33361 [1] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33363 [3] NCCL INFO Trees [0] 1/-1/-1->3->0 [1] 1/-1/-1->3->0 [2] 1/-1/-1->3->0 [3] 1/-1/-1->3->0 [4] 1/-1/-1->3->0 [5] 1/-1/-1->3->0 [6] 1/-1/-1->3->0 [7] 1/-1/-1->3->0 [8] 1/-1/-1->3->0 [9] 1/-1/-1->3->0 [10] 1/-1/-1->3->0 [11] 1/-1/-1->3->0 [12] 1/-1/-1->3->0 [13] 1/-1/-1->3->0 [14] 1/-1/-1->3->0 [15] 1/-1/-1->3->0
036260b61d3a:33347:33360 [0] NCCL INFO Channel 13/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 14/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33360 [0] NCCL INFO Channel 15/16 :    0   3   1   2   4   7   5   6
036260b61d3a:33347:33363 [3] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33360 [0] NCCL INFO Trees [0] 3/-1/-1->0->-1 [1] 3/-1/-1->0->-1 [2] 3/-1/-1->0->-1 [3] 3/-1/-1->0->-1 [4] 3/-1/-1->0->-1 [5] 3/-1/-1->0->-1 [6] 3/-1/-1->0->-1 [7] 3/-1/-1->0->-1 [8] 3/-1/-1->0->-1 [9] 3/-1/-1->0->-1 [10] 3/-1/-1->0->-1 [11] 3/-1/-1->0->-1 [12] 3/-1/-1->0->-1 [13] 3/-1/-1->0->-1 [14] 3/-1/-1->0->-1 [15] 3/-1/-1->0->-1
036260b61d3a:33347:33360 [0] NCCL INFO P2P Chunksize set to 524288
036260b61d3a:33347:33365 [5] NCCL INFO Channel 00/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 00/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 01/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 01/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 02/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 03/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 02/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 04/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 03/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 05/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 04/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 06/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 05/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 07/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 06/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 08/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 07/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 09/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 08/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 10/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 09/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 11/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 10/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 12/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 11/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 13/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 14/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 12/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 15/0 : 5[5] -> 6[6] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 13/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 14/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 15/0 : 1[1] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 00/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 00/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 01/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 02/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 01/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 03/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 02/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 04/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 03/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 05/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 04/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 06/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 07/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 05/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 08/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 06/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 09/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 07/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 08/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 10/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 11/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 09/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 12/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 10/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 13/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 11/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 14/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 15/0 : 6[6] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 12/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 13/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 00/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 14/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 01/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Connected all rings
036260b61d3a:33347:33360 [0] NCCL INFO Channel 02/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 00/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 15/0 : 2[2] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 00/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 01/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 03/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 01/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 02/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 04/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 03/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 05/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 02/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Connected all rings
036260b61d3a:33347:33366 [6] NCCL INFO Channel 04/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 06/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 03/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 05/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 07/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 04/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 06/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 08/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 05/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 07/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 09/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 06/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 08/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 10/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 07/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 09/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 11/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 08/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 10/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 12/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 09/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 11/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 13/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 12/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 10/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 14/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 13/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 11/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Channel 15/0 : 0[0] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 14/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 12/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33366 [6] NCCL INFO Channel 15/0 : 6[6] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 00/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 13/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 01/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 14/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 02/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 15/0 : 4[4] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Connected all rings
036260b61d3a:33347:33367 [7] NCCL INFO Channel 00/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 03/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 01/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 04/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 02/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 05/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Connected all rings
036260b61d3a:33347:33367 [7] NCCL INFO Channel 03/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 06/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 04/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 07/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 05/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 08/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 06/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 09/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 07/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 10/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 08/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 11/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 09/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 12/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 10/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 13/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 11/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 14/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 12/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 15/0 : 3[3] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 13/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 14/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 15/0 : 7[7] -> 5[5] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Connected all rings
036260b61d3a:33347:33361 [1] NCCL INFO Channel 00/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 01/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Connected all rings
036260b61d3a:33347:33367 [7] NCCL INFO Connected all rings
036260b61d3a:33347:33365 [5] NCCL INFO Connected all rings
036260b61d3a:33347:33365 [5] NCCL INFO Channel 00/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 02/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 01/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 03/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 02/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 04/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 03/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 05/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 04/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 06/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 05/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 07/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 06/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 08/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 07/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 09/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 08/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 10/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 09/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 11/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 10/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 12/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 11/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 13/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 12/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 14/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 13/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Channel 15/0 : 1[1] -> 3[3] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 14/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33365 [5] NCCL INFO Channel 15/0 : 5[5] -> 7[7] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 00/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 00/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 01/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 01/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 02/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 02/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 03/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 03/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 04/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 04/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 05/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 05/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 06/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 06/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 07/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 07/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 08/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 08/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 09/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 09/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 10/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 10/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 11/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 11/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 12/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 12/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 13/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 13/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 14/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 14/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33363 [3] NCCL INFO Channel 15/0 : 3[3] -> 0[0] via P2P/direct pointer
036260b61d3a:33347:33367 [7] NCCL INFO Channel 15/0 : 7[7] -> 4[4] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 00/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33360 [0] NCCL INFO Connected all trees
036260b61d3a:33347:33366 [6] NCCL INFO Connected all trees
036260b61d3a:33347:33366 [6] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33360 [0] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33365 [5] NCCL INFO Connected all trees
036260b61d3a:33347:33365 [5] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33366 [6] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33366 [6] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33365 [5] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33365 [5] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33367 [7] NCCL INFO Connected all trees
036260b61d3a:33347:33363 [3] NCCL INFO Connected all trees
036260b61d3a:33347:33367 [7] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33360 [0] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33360 [0] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33367 [7] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33367 [7] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33363 [3] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33363 [3] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33363 [3] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 01/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 02/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 03/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 04/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 05/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 06/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 07/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 08/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 09/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 10/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 11/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 12/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 13/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 14/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Channel 15/0 : 4[4] -> 2[2] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 00/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33364 [4] NCCL INFO Connected all trees
036260b61d3a:33347:33364 [4] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33364 [4] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33364 [4] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 01/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 02/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 03/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 04/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 05/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 06/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 07/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 08/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 09/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 10/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 11/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 12/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 13/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 14/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33362 [2] NCCL INFO Channel 15/0 : 2[2] -> 1[1] via P2P/direct pointer
036260b61d3a:33347:33361 [1] NCCL INFO Connected all trees
036260b61d3a:33347:33362 [2] NCCL INFO Connected all trees
036260b61d3a:33347:33361 [1] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33362 [2] NCCL INFO NCCL_ALGO set by environment to Tree,Ring,,CollnetDirect,CollnetChain,NVLS
036260b61d3a:33347:33361 [1] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33361 [1] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33362 [2] NCCL INFO threadThresholds 8/8/64 | 64/8/64 | 512 | 512
036260b61d3a:33347:33362 [2] NCCL INFO 16 coll channels, 0 nvls channels, 16 p2p channels, 16 p2p channels per peer
036260b61d3a:33347:33367 [7] NCCL INFO comm 0x563f5c8e74c0 rank 7 nranks 8 cudaDev 7 nvmlDev 7 busId d7000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33365 [5] NCCL INFO comm 0x563f5c8dbb00 rank 5 nranks 8 cudaDev 5 nvmlDev 5 busId ae000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33361 [1] NCCL INFO comm 0x563f5c8c4710 rank 1 nranks 8 cudaDev 1 nvmlDev 1 busId 34000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33363 [3] NCCL INFO comm 0x563f5c8d0140 rank 3 nranks 8 cudaDev 3 nvmlDev 3 busId 5a000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33364 [4] NCCL INFO comm 0x563f5c8d5e20 rank 4 nranks 8 cudaDev 4 nvmlDev 4 busId 87000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33360 [0] NCCL INFO comm 0x563f5c8bb420 rank 0 nranks 8 cudaDev 0 nvmlDev 0 busId f000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33366 [6] NCCL INFO comm 0x563f5c8e17e0 rank 6 nranks 8 cudaDev 6 nvmlDev 6 busId c2000 commId 0x151891d39ece6192 - Init COMPLETE
036260b61d3a:33347:33362 [2] NCCL INFO comm 0x563f5c8ca460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 busId 48000 commId 0x151891d39ece6192 - Init COMPLETE
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    86.48    0.00    0.00      0    29.46    0.00    0.00      0
          16             4     float     sum      -1    29.33    0.00    0.00      0    29.32    0.00    0.00      0
          32             8     float     sum      -1    29.59    0.00    0.00      0    29.72    0.00    0.00      0
          64            16     float     sum      -1    29.49    0.00    0.00      0    29.84    0.00    0.00      0
         128            32     float     sum      -1    29.54    0.00    0.01      0    29.82    0.00    0.01      0
         256            64     float     sum      -1    29.75    0.01    0.02      0    29.54    0.01    0.02      0
         512           128     float     sum      -1    29.85    0.02    0.03      0    29.97    0.02    0.03      0
        1024           256     float     sum      -1    29.90    0.03    0.06      0    29.41    0.03    0.06      0
        2048           512     float     sum      -1    29.92    0.07    0.12      0    29.81    0.07    0.12      0
        4096          1024     float     sum      -1    29.94    0.14    0.24      0    30.18    0.14    0.24      0
        8192          2048     float     sum      -1    30.08    0.27    0.48      0    29.76    0.28    0.48      0
       16384          4096     float     sum      -1    30.27    0.54    0.95      0    30.17    0.54    0.95      0
       32768          8192     float     sum      -1    29.96    1.09    1.91      0    30.12    1.09    1.90      0
       65536         16384     float     sum      -1    31.04    2.11    3.69      0    30.96    2.12    3.70      0
      131072         32768     float     sum      -1    33.66    3.89    6.82      0    33.97    3.86    6.75      0
      262144         65536     float     sum      -1    38.45    6.82   11.93      0    37.98    6.90   12.08      0
      524288        131072     float     sum      -1    47.95   10.93   19.13      0    48.55   10.80   18.90      0
     1048576        262144     float     sum      -1    48.25   21.73   38.03      0    49.01   21.39   37.44      0
     2097152        524288     float     sum      -1    49.60   42.28   73.99      0    49.46   42.40   74.21      0
     4194304       1048576     float     sum      -1    72.37   57.95  101.42      0    72.23   58.07  101.63      0
     8388608       2097152     float     sum      -1    126.7   66.23  115.90      0    125.3   66.97  117.20      0
    16777216       4194304     float     sum      -1    205.6   81.62  142.83      0    204.5   82.03  143.56      0
    33554432       8388608     float     sum      -1    380.8   88.12  154.21      0    379.7   88.37  154.64      0
    67108864      16777216     float     sum      -1    734.7   91.34  159.85      0    734.5   91.37  159.89      0
   134217728      33554432     float     sum      -1   1447.8   92.71  162.24      0   1447.7   92.71  162.24      0
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8bb420 rank 0 nranks 8 cudaDev 0 busId f000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8c4710 rank 1 nranks 8 cudaDev 1 busId 34000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8ca460 rank 2 nranks 8 cudaDev 2 busId 48000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8d0140 rank 3 nranks 8 cudaDev 3 busId 5a000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8d5e20 rank 4 nranks 8 cudaDev 4 busId 87000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8dbb00 rank 5 nranks 8 cudaDev 5 busId ae000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8e17e0 rank 6 nranks 8 cudaDev 6 busId c2000 - Destroy COMPLETE
036260b61d3a:33347:33347 [7] NCCL INFO comm 0x563f5c8e74c0 rank 7 nranks 8 cudaDev 7 busId d7000 - Destroy COMPLETE
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 39.7986
#
sjeaugey commented 1 year ago

H800 may require a more recent driver for NVLS support. I'd advise to contact your customer support to check which driver version you need to support NVLink SHARP operation.

shanleo2024 commented 5 months ago

I used the H800 also, can open the NVLS successfully.

NCCL INFO NVLS multicast support is available on dev 3