NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.24k stars 817 forks source link

Poor NCCL allreduce performance #1453

Open twichell opened 1 month ago

twichell commented 1 month ago

We are seeing an issue with NCCL allreduce performance that we would appreciate Nvidia's help on.

We have three nodes split across two racks: Two nodes on one rack and one node on another rack. Two-node performance either within a rack or across racks is OK. Three-node performance across racks is severely degraded. We've replicated this on different sets of nodes and racks.

The configuration is as follows:

Example output from the three-node case is below. Bandwidth at the 8 GB data size is 99% degraded from our two-node case. Degradation is noticeable but less severe at smaller data sizes starting at around 8 KB. We also note that the drop-off in bandwidth going from the 32 MB to 64 MB data size is consistent across executions of the test.

[0] #
[0] # Using devices
[0] #  Rank  0 Group  0 Pid 227602 on h100clust-worker-1 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank  1 Group  0 Pid 227603 on h100clust-worker-1 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank  2 Group  0 Pid 227604 on h100clust-worker-1 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank  3 Group  0 Pid 227605 on h100clust-worker-1 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank  4 Group  0 Pid 227606 on h100clust-worker-1 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank  5 Group  0 Pid 227607 on h100clust-worker-1 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank  6 Group  0 Pid 227608 on h100clust-worker-1 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank  7 Group  0 Pid 227609 on h100clust-worker-1 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] #  Rank  8 Group  0 Pid 227324 on h100clust-worker-32 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank  9 Group  0 Pid 227325 on h100clust-worker-32 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank 10 Group  0 Pid 227326 on h100clust-worker-32 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank 11 Group  0 Pid 227327 on h100clust-worker-32 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank 12 Group  0 Pid 227328 on h100clust-worker-32 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank 13 Group  0 Pid 227329 on h100clust-worker-32 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank 14 Group  0 Pid 227330 on h100clust-worker-32 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank 15 Group  0 Pid 227331 on h100clust-worker-32 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] #  Rank 16 Group  0 Pid 227195 on h100clust-worker-5 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank 17 Group  0 Pid 227196 on h100clust-worker-5 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank 18 Group  0 Pid 227197 on h100clust-worker-5 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank 19 Group  0 Pid 227198 on h100clust-worker-5 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank 20 Group  0 Pid 227199 on h100clust-worker-5 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank 21 Group  0 Pid 227200 on h100clust-worker-5 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank 22 Group  0 Pid 227201 on h100clust-worker-5 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank 23 Group  0 Pid 227202 on h100clust-worker-5 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] h100clust-worker-1:227602:227602 [0] NCCL INFO Bootstrap : Using enp0s3:10.241.128.7<0>
[0] h100clust-worker-1:227602:227602 [0] NCCL INFO cudaDriverVersion 12040
[0] h100clust-worker-1:227602:227602 [0] NCCL INFO NCCL version 2.22.3+cuda12.5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO P2P plugin v8 IBext_v8
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_IB_ADAPTIVE_ROUTING set by environment to 1.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 2.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB enp0s3:10.241.128.7<0>
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Using network IBext_v8
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_CHECK_POINTERS set by environment to 0.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO DMA-BUF is available on GPU device 0
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO ncclCommInitRank comm 0x5639e9f0e8c0 rank 0 nranks 24 cudaDev 0 nvmlDev 0 busId a4000 commId 0x9e0c93091bae3b9e - Init START
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO MNNVL busId 0xa4000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_TOPO_FILE set by environment to filename
[0]
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS multicast support is available on dev 0
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_CROSS_NIC set by environment to 2.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO comm 0x5639e9f0e8c0 rank 0 nRanks 24 nNodes 3 localRanks 8 localRank 0 MNNVL 0
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  0:  0  8 16
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  1:  1  9 17
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  2:  2 10 18
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  3:  3 11 19
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  4:  4 12 20
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  5:  5 13 21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  6:  6 14 22
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NVLS Head  7:  7 15 23
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 00/16 :    0   7   6   5   4   3   2   1   8  15  14  13  12  11  10   9  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 01/16 :    0   7   6   5   4   3   2   9   8  15  14  13  12  11  10  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 02/16 :    0   7   6   5   4   3  10   9   8  15  14  13  12  11  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 03/16 :    0   7   6   5   4  11  10   9   8  15  14  13  12  19  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 04/16 :    0   7   6   5  12  11  10   9   8  15  14  13  20  19  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 05/16 :    0   7   6  13  12  11  10   9   8  15  14  21  20  19  18  17  16  23  22   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 06/16 :    0   7  14  13  12  11  10   9   8  15  22  21  20  19  18  17  16  23   6   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 07/16 :    0  15  14  13  12  11  10   9   8  23  22  21  20  19  18  17  16   7   6   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 08/16 :    0   7   6   5   4   3   2   1   8  15  14  13  12  11  10   9  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 09/16 :    0   7   6   5   4   3   2   9   8  15  14  13  12  11  10  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 10/16 :    0   7   6   5   4   3  10   9   8  15  14  13  12  11  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 11/16 :    0   7   6   5   4  11  10   9   8  15  14  13  12  19  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 12/16 :    0   7   6   5  12  11  10   9   8  15  14  13  20  19  18  17  16  23  22  21
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 13/16 :    0   7   6  13  12  11  10   9   8  15  14  21  20  19  18  17  16  23  22   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 14/16 :    0   7  14  13  12  11  10   9   8  15  22  21  20  19  18  17  16  23   6   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Channel 15/16 :    0  15  14  13  12  11  10   9   8  23  22  21  20  19  18  17  16   7   6   5
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Trees [0] 1/16/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/16/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO NCCL_BUFFSIZE set by environment to 67108864.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO P2P Chunksize set to 131072
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO threadThresholds 8/8/64 | 192/8/64 | 512 | 512
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO ncclCommInitRank comm 0x5639e9f0e8c0 rank 0 nranks 24 cudaDev 0 nvmlDev 0 busId a4000 commId 0x9e0c93091bae3b9e - Init COMPLETE
[0] h100clust-worker-1:227602:227667 [0] NCCL INFO Init timings: rank 0 nranks 24 total 3.38 (kernels 0.21, bootstrap 2.79, allgathers 0.01, topo 0.03, graphs 0.22, connections 0.12, rest 0.00)
[0] #
[0] #                                                              out-of-place                       in-place
[0] #       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
[0] #        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 00/0 : 17[1] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 08/0 : 17[1] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 07/0 : 0[0] -> 15[7] [send] via NET/IBext_v8/0(7)/GDRDMA
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Channel 15/0 : 0[0] -> 15[7] [send] via NET/IBext_v8/0(7)/GDRDMA
[0] h100clust-worker-1:227602:227752 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 2.
[0] h100clust-worker-1:227602:227752 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
[0] h100clust-worker-1:227602:227752 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22.
[0] h100clust-worker-1:227602:227752 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 10.
[0] h100clust-worker-1:227602:227760 [0] NCCL INFO Connected all rings
[0]            0             0     float     sum      -1[0]      0.23    0.00    0.00      0[0]      0.19    0.00    0.00      0
[0]            0             0     float     sum      -1[0]      0.18    0.00    0.00      0[0]      0.19    0.00    0.00      0
[0]            4             1     float     sum      -1[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 00/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 08/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 00/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Channel 08/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227779 [0] NCCL INFO Connected all trees
[0]     292.4    0.00    0.00      0[0]     62.08    0.00    0.00      0
[0]            8             2     float     sum      -1[0]     60.67    0.00    0.00      0[0]     60.67    0.00    0.00      0
[0]           16             4     float     sum      -1[0]     59.92    0.00    0.00      0[0]     60.01    0.00    0.00      0
[0]           32             8     float     sum      -1[0]     60.32    0.00    0.00      0[0]     60.10    0.00    0.00      0
[0]           64            16     float     sum      -1[0]     59.91    0.00    0.00      0[0]     60.42    0.00    0.00      0
[0]          128            32     float     sum      -1[0]     61.22    0.00    0.00      0[0]     60.46    0.00    0.00      0
[0]          256            64     float     sum      -1[0]     72.02    0.00    0.01      0[0]     63.42    0.00    0.01      0
[0]          512           128     float     sum      -1[0]     92.56    0.01    0.01      0[0]     63.35    0.01    0.02      0
[0]         1024           256     float     sum      -1[0]     66.46    0.02    0.03      0[0]     66.23    0.02    0.03      0
[0]         2048           512     float     sum      -1[0]     72.90    0.03    0.05      0[0]     71.77    0.03    0.05      0
[0]         4096          1024     float     sum      -1[0]     75.75    0.05    0.10      0[0]     74.96    0.05    0.10      0
[0]         8192          2048     float     sum      -1[0]     148.8    0.06    0.11      0[0]     167.2    0.05    0.09      0
[0]        16384          4096     float     sum      -1[0]     366.4    0.04    0.09      0[0]     238.9    0.07    0.13      0
[0]        32768          8192     float     sum      -1[0]     159.4    0.21    0.39      0[0]     201.8    0.16    0.31      0
[0]        65536         16384     float     sum      -1[0]     388.9    0.17    0.32      0[0]     366.6    0.18    0.34      0
[0]       131072         32768     float     sum      -1[0]     386.7    0.34    0.65      0[0]     227.0    0.58    1.11      0
[0]       262144         65536     float     sum      -1[0]     283.9    0.92    1.77      0[0]     237.0    1.11    2.12      0
[0]       524288        131072     float     sum      -1[0] h100clust-worker-1:227602:227792 [0] NCCL INFO NVLS comm 0x5639e9f0e8c0 headRank 0 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 01/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 02/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 03/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 04/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 05/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 06/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 07/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 09/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 10/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 11/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 12/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 13/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 14/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 15/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 03/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 05/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 01/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 03/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 05/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 07/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 09/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 11/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 13/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 15/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 01/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 02/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 03/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 04/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 05/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 06/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 07/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 09/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 10/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 11/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 12/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 13/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 14/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Channel 15/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:227602:227792 [0] NCCL INFO Connected NVLS tree
[0]     366.5    1.43    2.74      0[0]     340.8    1.54    2.95      0
[0]      1048576        262144     float     sum      -1[0]    1626.5    0.64    1.24      0[0]    2302.3    0.46    0.87      0
[0]      2097152        524288     float     sum      -1[0]    2325.3    0.90    1.73      0[0]     416.7    5.03    9.65      0
[0]      4194304       1048576     float     sum      -1[0]     622.6    6.74   12.91      0[0]     625.8    6.70   12.85      0
[0]      8388608       2097152     float     sum      -1[0]     992.6    8.45   16.20      0[0]     986.0    8.51   16.31      0
[0]     16777216       4194304     float     sum      -1[0]    1334.9   12.57   24.09      0[0]    1336.5   12.55   24.06      0
[0]     33554432       8388608     float     sum      -1[0]    2180.1   15.39   29.50      0[0]    2285.2   14.68   28.14      0
[0]     67108864      16777216     float     sum      -1[0]     24410    2.75    5.27      0[0]     24425    2.75    5.27      0
[0]    134217728      33554432     float     sum      -1[0]     49264    2.72    5.22      0[0]     48558    2.76    5.30      0
[0]    268435456      67108864     float     sum      -1[0]    125807    2.13    4.09      0[0]    121575    2.21    4.23      0
[0]    536870912     134217728     float     sum      -1[0]    193909    2.77    5.31      0[0]    192903    2.78    5.33      0
[0]   1073741824     268435456     float     sum      -1[0]    375273    2.86    5.48      0[0]    347528    3.09    5.92      0
[0]   2147483648     536870912     float     sum      -1[0]    795717    2.70    5.17      0[0]    809022    2.65    5.09      0
[0]   4294967296    1073741824     float     sum      -1[0]   1561938    2.75    5.27      0[0]   1542605    2.78    5.34      0
[0]   8589934592    2147483648     float     sum      -1[0]   3113743    2.76    5.29      0[0]   3033342    2.83    5.43      0
[0] h100clust-worker-1:227602:227602 [0] NCCL INFO comm 0x5639e9f0e8c0 rank 0 nranks 24 cudaDev 0 busId a4000 - Destroy COMPLETE
[0] # Out of bounds values : 0 OK
[0] # Avg bus bandwidth    : 4.0309
[0] #
[0]

The output of nvidia-smi -q for one GPU is provided below. This was captured with no workload running.

==============NVSMI LOG==============

Timestamp                                 : Wed Sep 18 18:45:28 2024
Driver Version                            : 550.90.07
CUDA Version                              : 12.4

Attached GPUs                             : 8
GPU 00000000:A4:00.0
    Product Name                          : NVIDIA H100 80GB HBM3
    Product Brand                         : NVIDIA
    Product Architecture                  : Hopper
    Display Mode                          : Enabled
    Display Active                        : Disabled
    Persistence Mode                      : Enabled
    Addressing Mode                       : None
    MIG Mode
        Current                           : Disabled
        Pending                           : Disabled
    Accounting Mode                       : Disabled
    Accounting Mode Buffer Size           : 4000
    Driver Model
        Current                           : N/A
        Pending                           : N/A
    Serial Number                         : 1651524097284
    GPU UUID                              : GPU-1312e97f-2fdf-c97f-7b5c-81c26d624d2d
    Minor Number                          : 7
    VBIOS Version                         : 96.00.99.00.01
    MultiGPU Board                        : No
    Board ID                              : 0xa400
    Board Part Number                     : 692-2G520-0200-000
    GPU Part Number                       : 2330-885-A1
    FRU Part Number                       : N/A
    Module ID                             : 2
    Inforom Version
        Image Version                     : G520.0200.00.05
        OEM Object                        : 2.1
        ECC Object                        : 7.16
        Power Management Object           : N/A
    Inforom BBX Object Flush
        Latest Timestamp                  : N/A
        Latest Duration                   : N/A
    GPU Operation Mode
        Current                           : N/A
        Pending                           : N/A
    GPU C2C Mode                          : Disabled
    GPU Virtualization Mode
        Virtualization Mode               : Pass-Through
        Host VGPU Mode                    : N/A
        vGPU Heterogeneous Mode           : N/A
    GPU Reset Status
        Reset Required                    : No
        Drain and Reset Recommended       : No
    GSP Firmware Version                  : 550.90.07
    IBMNPU
        Relaxed Ordering Mode             : N/A
    PCI
        Bus                               : 0xA4
        Device                            : 0x00
        Domain                            : 0x0000
        Base Classcode                    : 0x3
        Sub Classcode                     : 0x2
        Device Id                         : 0x233010DE
        Bus Id                            : 00000000:A4:00.0
        Sub System Id                     : 0x16C110DE
        GPU Link Info
            PCIe Generation
                Max                       : 5
                Current                   : 5
                Device Current            : 5
                Device Max                : 5
                Host Max                  : N/A
            Link Width
                Max                       : 16x
                Current                   : 16x
        Bridge Chip
            Type                          : N/A
            Firmware                      : N/A
        Replays Since Reset               : 0
        Replay Number Rollovers           : 0
        Tx Throughput                     : 4644 KB/s
        Rx Throughput                     : 1027 KB/s
        Atomic Caps Inbound               : N/A
        Atomic Caps Outbound              : N/A
    Fan Speed                             : N/A
    Performance State                     : P0
    Clocks Event Reasons
        Idle                              : Active
        Applications Clocks Setting       : Not Active
        SW Power Cap                      : Not Active
        HW Slowdown                       : Not Active
            HW Thermal Slowdown           : Not Active
            HW Power Brake Slowdown       : Not Active
        Sync Boost                        : Not Active
        SW Thermal Slowdown               : Not Active
        Display Clock Setting             : Not Active
    Sparse Operation Mode                 : Disabled
    FB Memory Usage
        Total                             : 81559 MiB
        Reserved                          : 565 MiB
        Used                              : 1 MiB
        Free                              : 80995 MiB
    BAR1 Memory Usage
        Total                             : 131072 MiB
        Used                              : 1 MiB
        Free                              : 131071 MiB
    BAR1 Memory Usage
        Total                             : 131072 MiB
        Used                              : 1 MiB
        Free                              : 131071 MiB
    Conf Compute Protected Memory Usage
        Total                             : 0 MiB
        Used                              : 0 MiB
        Free                              : 0 MiB
    Compute Mode                          : Default
    Utilization
        Gpu                               : 0 %
        Memory                            : 0 %
        Encoder                           : 0 %
        Decoder                           : 0 %
        JPEG                              : 0 %
        OFA                               : 0 %
    Encoder Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    FBC Stats
        Active Sessions                   : 0
        Average FPS                       : 0
        Average Latency                   : 0
    ECC Mode
        Current                           : Enabled
        Pending                           : Enabled
    ECC Errors
        Volatile
            SRAM Correctable              : 0
            SRAM Uncorrectable Parity     : 0
            SRAM Uncorrectable SEC-DED    : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
        Aggregate
            SRAM Correctable              : 0
            SRAM Uncorrectable Parity     : 0
            SRAM Uncorrectable SEC-DED    : 0
            DRAM Correctable              : 0
            DRAM Uncorrectable            : 0
            SRAM Threshold Exceeded       : No
        Aggregate Uncorrectable SRAM Sources
            SRAM L2                       : 0
            SRAM SM                       : 0
            SRAM Microcontroller          : 0
            SRAM PCIE                     : 0
            SRAM Other                    : 0
    Retired Pages
        Single Bit ECC                    : N/A
        Double Bit ECC                    : N/A
        Pending Page Blacklist            : N/A
    Remapped Rows
        Correctable Error                 : 0
        Uncorrectable Error               : 0
        Pending                           : No
        Remapping Failure Occurred        : No
        Bank Remap Availability Histogram
            Max                           : 2560 bank(s)
            High                          : 0 bank(s)
            Partial                       : 0 bank(s)
            Low                           : 0 bank(s)
            None                          : 0 bank(s)
    Temperature
        GPU Current Temp                  : 32 C
        GPU T.Limit Temp                  : 54 C
        GPU Shutdown T.Limit Temp         : -8 C
        GPU Slowdown T.Limit Temp         : -2 C
        GPU Max Operating T.Limit Temp    : 0 C
        GPU Target Temperature            : N/A
        Memory Current Temp               : 36 C
        Memory Max Operating T.Limit Temp : 0 C
    GPU Power Readings
        Power Draw                        : 113.19 W
        Current Power Limit               : 700.00 W
        Requested Power Limit             : 700.00 W
        Default Power Limit               : 700.00 W
        Min Power Limit                   : 200.00 W
        Max Power Limit                   : 700.00 W
    GPU Memory Power Readings
        Power Draw                        : 34.56 W
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A
    Clocks
        Graphics                          : 1830 MHz
        SM                                : 1830 MHz
        Memory                            : 2619 MHz
        Video                             : 1545 MHz
    Applications Clocks
        Graphics                          : 1980 MHz
        Memory                            : 2619 MHz
    Default Applications Clocks
        Graphics                          : 1980 MHz
        Memory                            : 2619 MHz
    Deferred Clocks
        Memory                            : N/A
    Max Clocks
        Graphics                          : 1980 MHz
        SM                                : 1980 MHz
        Memory                            : 2619 MHz
        Video                             : 1545 MHz
    Max Customer Boost Clocks
        Graphics                          : 1980 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Voltage
        Graphics                          : 950.000 mV
    Fabric
        State                             : Completed
        Status                            : Success
        CliqueId                          : 0
        ClusterUUID                       : 00000000-0000-0000-0000-000000000000
        Health
            Bandwidth                     : N/A
    Processes                             : None
kiskra-nvidia commented 1 month ago

You might want to try rerunning it with NCCL_DEBUG_SUBSYS=INIT,ENV,TUNING, which will tell us what algo/proto combination NCCL is choosing for every collective operation. I'm guessing it switches over to a different one at 64MB, but the new one is severely underperforming for some reason...

What does the topology look like in the file that's passed via NCCL_TOPO_FILE? What does nvidia-smi topo -m show?

twichell commented 1 month ago

nvidia-smi topo -m

    GPU0    GPU1    GPU2    GPU3    GPU4    GPU5    GPU6    GPU7    NIC0    NIC1    NIC2    NIC3    NIC4    NIC5    NIC6    NIC7    NIC8    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X  NV18    NV18    NV18    NV18    NV18    NV18    NV18    SYS SYS SYS SYS SYS NODE    NODE    NODE    PIX 0-79    0       N/A
GPU1    NV18     X  NV18    NV18    NV18    NV18    NV18    NV18    SYS SYS SYS SYS SYS NODE    NODE    PIX NODE    0-79    0       N/A
GPU2    NV18    NV18     X  NV18    NV18    NV18    NV18    NV18    SYS SYS SYS SYS SYS NODE    PIX NODE    NODE    0-79    0       N/A
GPU3    NV18    NV18    NV18     X  NV18    NV18    NV18    NV18    SYS SYS SYS SYS SYS PIX NODE    NODE    NODE    0-79    0       N/A
GPU4    NV18    NV18    NV18    NV18     X  NV18    NV18    NV18    SYS NODE    NODE    NODE    PIX SYS SYS SYS SYS 80-159  1       N/A
GPU5    NV18    NV18    NV18    NV18    NV18     X  NV18    NV18    SYS NODE    NODE    PIX NODE    SYS SYS SYS SYS 80-159  1       N/A
GPU6    NV18    NV18    NV18    NV18    NV18    NV18     X  NV18    SYS NODE    PIX NODE    NODE    SYS SYS SYS SYS 80-159  1       N/A
GPU7    NV18    NV18    NV18    NV18    NV18    NV18    NV18     X  SYS PIX NODE    NODE    NODE    SYS SYS SYS SYS 80-159  1       N/A
NIC0    SYS SYS SYS SYS SYS SYS SYS SYS  X  SYS SYS SYS SYS SYS SYS SYS SYS
NIC1    SYS SYS SYS SYS NODE    NODE    NODE    PIX SYS  X  NODE    NODE    NODE    SYS SYS SYS SYS
NIC2    SYS SYS SYS SYS NODE    NODE    PIX NODE    SYS NODE     X  NODE    NODE    SYS SYS SYS SYS
NIC3    SYS SYS SYS SYS NODE    PIX NODE    NODE    SYS NODE    NODE     X  NODE    SYS SYS SYS SYS
NIC4    SYS SYS SYS SYS PIX NODE    NODE    NODE    SYS NODE    NODE    NODE     X  SYS SYS SYS SYS
NIC5    NODE    NODE    NODE    PIX SYS SYS SYS SYS SYS SYS SYS SYS SYS  X  NODE    NODE    NODE
NIC6    NODE    NODE    PIX NODE    SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE     X  NODE    NODE
NIC7    NODE    PIX NODE    NODE    SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE    NODE     X  NODE
NIC8    PIX NODE    NODE    NODE    SYS SYS SYS SYS SYS SYS SYS SYS SYS NODE    NODE    NODE     X

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

NIC Legend:

  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8

Topology file

<system version="1">
  <cpu host_hash="0x8753b8a01ef0a140" numaid="0" affinity="00000000,00000000,0000ffff,ffffffff,ffffffff" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
    <pci busid="0000:a1:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:a3:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_8" dev="7" speed="200000" port="1" latency="0.000000" guid="0x2821dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:a4:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="0" sm="90" rank="0" gdr="1">
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:ab:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:ad:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_7" dev="6" speed="200000" port="1" latency="0.000000" guid="0xe823dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:ae:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="1" sm="90" rank="1" gdr="1">
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:b5:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:b7:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_6" dev="5" speed="200000" port="1" latency="0.000000" guid="0x822dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:b8:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="2" sm="90" rank="2" gdr="1">
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:bf:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:c1:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_5" dev="4" speed="200000" port="1" latency="0.000000" guid="0x2828dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:c2:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="3" sm="90" rank="3" gdr="1">
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
  </cpu>
  <cpu host_hash="0x8753b8a01ef0a140" numaid="1" affinity="ffffffff,ffffffff,ffff0000,00000000,00000000" arch="x86_64" vendor="GenuineIntel" familyid="6" modelid="143">
    <pci busid="0000:c9:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:cb:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_4" dev="3" speed="200000" port="1" latency="0.000000" guid="0xb811dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:cc:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="4" sm="90" rank="4" gdr="1">
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:d3:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:d5:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_3" dev="2" speed="200000" port="1" latency="0.000000" guid="0xc808dc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:d6:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="5" sm="90" rank="5" gdr="1">
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:dd:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:df:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_2" dev="1" speed="200000" port="1" latency="0.000000" guid="0xa80bdc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:e0:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="6" sm="90" rank="6" gdr="1">
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
    <pci busid="0000:e7:00.0" class="0x060400" vendor="0x104c" device="0x8232" subsystem_vendor="0x0000" subsystem_device="0x0000" link_speed="2.5 GT/s PCIe" link_width="1">
      <pci busid="0000:e9:00.0" class="0x020000" vendor="0x15b3" device="0x101e" subsystem_vendor="0x15b3" subsystem_device="0x0127" link_speed="32.0 GT/s PCIe" link_width="0">
        <nic>
          <net name="mlx5_1" dev="0" speed="200000" port="1" latency="0.000000" guid="0x480cdc0003e1a258" maxconn="131072" gdr="1"/>
        </nic>
      </pci>
      <pci busid="0000:ea:00.0" class="0x030200" vendor="0x10de" device="0x2330" subsystem_vendor="0x10de" subsystem_device="0x16c1" link_speed="32.0 GT/s PCIe" link_width="0">
        <gpu dev="7" sm="90" rank="7" gdr="1">
          <nvlink target="0000:f4:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f6:00.0" count="4" tclass="0x068000"/>
          <nvlink target="0000:f5:00.0" count="5" tclass="0x068000"/>
          <nvlink target="0000:f3:00.0" count="4" tclass="0x068000"/>
        </gpu>
      </pci>
    </pci>
  </cpu>
</system>

Output from run with NCCL_DEBUG_SUBSYS=INIT,ENV,TUNING

[0] # nThread 1 nGpus 1 minBytes 1 maxBytes 8589934592 step: 2(factor) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
[0] #
[0] # Using devices
[0] #  Rank  0 Group  0 Pid 497888 on h100clust-worker-1 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank  1 Group  0 Pid 497889 on h100clust-worker-1 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank  2 Group  0 Pid 497890 on h100clust-worker-1 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank  3 Group  0 Pid 497891 on h100clust-worker-1 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank  4 Group  0 Pid 497892 on h100clust-worker-1 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank  5 Group  0 Pid 497893 on h100clust-worker-1 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank  6 Group  0 Pid 497894 on h100clust-worker-1 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank  7 Group  0 Pid 497895 on h100clust-worker-1 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] #  Rank  8 Group  0 Pid 496763 on h100clust-worker-32 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank  9 Group  0 Pid 496764 on h100clust-worker-32 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank 10 Group  0 Pid 496765 on h100clust-worker-32 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank 11 Group  0 Pid 496766 on h100clust-worker-32 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank 12 Group  0 Pid 496767 on h100clust-worker-32 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank 13 Group  0 Pid 496768 on h100clust-worker-32 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank 14 Group  0 Pid 496769 on h100clust-worker-32 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank 15 Group  0 Pid 496770 on h100clust-worker-32 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] #  Rank 16 Group  0 Pid 497321 on h100clust-worker-5 device  0 [0xa4] NVIDIA H100 80GB HBM3
[0] #  Rank 17 Group  0 Pid 497322 on h100clust-worker-5 device  1 [0xae] NVIDIA H100 80GB HBM3
[0] #  Rank 18 Group  0 Pid 497323 on h100clust-worker-5 device  2 [0xb8] NVIDIA H100 80GB HBM3
[0] #  Rank 19 Group  0 Pid 497324 on h100clust-worker-5 device  3 [0xc2] NVIDIA H100 80GB HBM3
[0] #  Rank 20 Group  0 Pid 497325 on h100clust-worker-5 device  4 [0xcc] NVIDIA H100 80GB HBM3
[0] #  Rank 21 Group  0 Pid 497326 on h100clust-worker-5 device  5 [0xd6] NVIDIA H100 80GB HBM3
[0] #  Rank 22 Group  0 Pid 497327 on h100clust-worker-5 device  6 [0xe0] NVIDIA H100 80GB HBM3
[0] #  Rank 23 Group  0 Pid 497328 on h100clust-worker-5 device  7 [0xea] NVIDIA H100 80GB HBM3
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO Bootstrap : Using enp0s3:10.241.128.7<0>
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO cudaDriverVersion 12040
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO NCCL version 2.22.3+cuda12.5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Plugin Path : /usr/local/lib/libnccl-net.so
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO P2P plugin v8 IBext_v8
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_IB_ADAPTIVE_ROUTING set by environment to 1.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_IB_PCI_RELAXED_ORDERING set by environment to 2.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NET/IB : Using [0]mlx5_1:1/RoCE [1]mlx5_2:1/RoCE [2]mlx5_3:1/RoCE [3]mlx5_4:1/RoCE [4]mlx5_5:1/RoCE [5]mlx5_6:1/RoCE [6]mlx5_7:1/RoCE [7]mlx5_8:1/RoCE [RO]; OOB enp0s3:10.241.128.7<0>
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Using network IBext_v8
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_CHECK_POINTERS set by environment to 0.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO DMA-BUF is available on GPU device 0
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO ncclCommInitRank comm 0x561097745b80 rank 0 nranks 24 cudaDev 0 nvmlDev 0 busId a4000 commId 0x1d39176440db4936 - Init START
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO MNNVL busId 0xa4000 fabric UUID 0.0 cliqueId 0x0 state 3 healthMask 0x0
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_TOPO_FILE set by environment to /home/greg/output/mn-h100-vela2.xml.pristine.xml
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_TOPO_DUMP_FILE set by environment to /home/greg/output/mn-h100-vela2.xml
[0]
[0] h100clust-worker-1:497888:497958 [0] graph/xml.cc:267 NCCL WARN Unable to open /home/greg/output/mn-h100-vela2.xml, not dumping topology.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Setting affinity for GPU 0 to ffff,ffffffff,ffffffff
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS multicast support is available on dev 0
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_CROSS_NIC set by environment to 2.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO comm 0x561097745b80 rank 0 nRanks 24 nNodes 3 localRanks 8 localRank 0 MNNVL 0
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  0:  0  8 16
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  1:  1  9 17
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  2:  2 10 18
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  3:  3 11 19
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  4:  4 12 20
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  5:  5 13 21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  6:  6 14 22
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NVLS Head  7:  7 15 23
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 00/16 :    0   7   6   5   4   3   2   1   8  15  14  13  12  11  10   9  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 01/16 :    0   7   6   5   4   3   2   9   8  15  14  13  12  11  10  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 02/16 :    0   7   6   5   4   3  10   9   8  15  14  13  12  11  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 03/16 :    0   7   6   5   4  11  10   9   8  15  14  13  12  19  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 04/16 :    0   7   6   5  12  11  10   9   8  15  14  13  20  19  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 05/16 :    0   7   6  13  12  11  10   9   8  15  14  21  20  19  18  17  16  23  22   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 06/16 :    0   7  14  13  12  11  10   9   8  15  22  21  20  19  18  17  16  23   6   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 07/16 :    0  15  14  13  12  11  10   9   8  23  22  21  20  19  18  17  16   7   6   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 08/16 :    0   7   6   5   4   3   2   1   8  15  14  13  12  11  10   9  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 09/16 :    0   7   6   5   4   3   2   9   8  15  14  13  12  11  10  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 10/16 :    0   7   6   5   4   3  10   9   8  15  14  13  12  11  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 11/16 :    0   7   6   5   4  11  10   9   8  15  14  13  12  19  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 12/16 :    0   7   6   5  12  11  10   9   8  15  14  13  20  19  18  17  16  23  22  21
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 13/16 :    0   7   6  13  12  11  10   9   8  15  14  21  20  19  18  17  16  23  22   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 14/16 :    0   7  14  13  12  11  10   9   8  15  22  21  20  19  18  17  16  23   6   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Channel 15/16 :    0  15  14  13  12  11  10   9   8  23  22  21  20  19  18  17  16   7   6   5
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Trees [0] 1/16/-1->0->-1 [1] -1/-1/-1->0->7 [2] 1/-1/-1->0->7 [3] 1/-1/-1->0->7 [4] 1/-1/-1->0->7 [5] 1/-1/-1->0->7 [6] 1/-1/-1->0->7 [7] 1/-1/-1->0->7 [8] 1/16/-1->0->8 [9] -1/-1/-1->0->7 [10] 1/-1/-1->0->7 [11] 1/-1/-1->0->7 [12] 1/-1/-1->0->7 [13] 1/-1/-1->0->7 [14] 1/-1/-1->0->7 [15] 1/-1/-1->0->7
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO NCCL_BUFFSIZE set by environment to 67108864.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO P2P Chunksize set to 131072
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO   Algorithm   |                            Tree                  |                            Ring                  |                   CollNetDirect                  |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO   Protocol    |             LL |          LL128 |         Simple |             LL |          LL128 |         Simple |             LL |          LL128 |         Simple |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO  Max NThreads |            512 |            640 |            512 |            512 |            640 |            512 |              0 |              0 |            640 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     Broadcast |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    71.4/  20.4 |   110.0/ 176.6 |   680.4/ 192.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO        Reduce |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    71.4/  20.4 |   110.0/ 176.6 |   680.4/ 192.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     AllGather |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    33.0/  21.3 |    61.9/ 184.3 |   107.8/ 200.3 |     5.6/   0.0 |     5.6/   0.0 |    44.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO ReduceScatter |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    33.0/  21.3 |    61.9/ 184.3 |   107.8/ 200.3 |     5.6/   0.0 |     5.6/   0.0 |    44.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     AllReduce |    25.2/   8.7 |    48.5/  70.4 |   448.0/  75.1 |    62.8/  10.6 |   114.0/  92.2 |   228.4/ 100.2 |     5.6/   0.0 |     5.6/   0.0 |    44.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO   Algorithm   |                    CollNetChain                  |                            NVLS                  |                        NVLSTree                  |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO   Protocol    |             LL |          LL128 |         Simple |             LL |          LL128 |         Simple |             LL |          LL128 |         Simple |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO  Max NThreads |              0 |              0 |            640 |              0 |              0 |            640 |              0 |              0 |            640 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     Broadcast |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO        Reduce |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     AllGather |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    43.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO ReduceScatter |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    43.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO     AllReduce |     0.0/   0.0 |     0.0/   0.0 |    69.2/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    43.0/   0.0 |     0.0/   0.0 |     0.0/   0.0 |    53.0/  80.0 |
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO threadThresholds 8/8/64 | 192/8/64 | 512 | 512
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO 16 coll channels, 16 collnet channels, 16 nvls channels, 16 p2p channels, 2 p2p channels per peer
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO CC Off, Multi-GPU CC Off, workFifoBytes 1048576
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v3 symbol.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO TUNER/Plugin: Failed to find ncclTunerPlugin_v2 symbol, using internal tuner instead.
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO ncclCommInitRank comm 0x561097745b80 rank 0 nranks 24 cudaDev 0 nvmlDev 0 busId a4000 commId 0x1d39176440db4936 - Init COMPLETE
[0] h100clust-worker-1:497888:497958 [0] NCCL INFO Init timings: rank 0 nranks 24 total 3.45 (kernels 0.33, bootstrap 2.75, allgathers 0.10, topo 0.03, graphs 0.12, connections 0.12, rest 0.00)
[0] #
[0] #                                                              out-of-place                       in-place
[0] #       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
[0] #        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 00/0 : 17[1] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 08/0 : 17[1] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 00/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 01/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 02/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 03/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 04/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 05/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 06/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 08/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 09/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 10/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 11/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 12/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 13/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 14/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 07/0 : 0[0] -> 15[7] [send] via NET/IBext_v8/0(7)/GDRDMA
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Channel 15/0 : 0[0] -> 15[7] [send] via NET/IBext_v8/0(7)/GDRDMA
[0] h100clust-worker-1:497888:498043 [0] NCCL INFO NCCL_IB_QPS_PER_CONNECTION set by environment to 2.
[0] h100clust-worker-1:497888:498043 [0] NCCL INFO NCCL_IB_GID_INDEX set by environment to 3.
[0] h100clust-worker-1:497888:498043 [0] NCCL INFO NCCL_IB_TIMEOUT set by environment to 22.
[0] h100clust-worker-1:497888:498043 [0] NCCL INFO NCCL_IB_RETRY_CNT set by environment to 10.
[0] h100clust-worker-1:497888:498059 [0] NCCL INFO Connected all rings
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0]            0             0     float     sum      -1[0]      0.42    0.00    0.00      0[0]      0.19    0.00    0.00      0
[0]            0             0     float     sum      -1[0]      0.18    0.00    0.00      0[0]      0.19    0.00    0.00      0
[0]            4             1     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 00/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 02/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 03/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 04/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 05/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 06/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 07/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 08/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 10/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 11/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 12/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 13/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 14/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 15/0 : 0[0] -> 1[1] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 07/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 15/0 : 0[0] -> 7[7] via P2P/CUMEM
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 00/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 08/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 08/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 08/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 00/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Channel 08/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498087 [0] NCCL INFO Connected all trees
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0]     267.6    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4 Bytes -> Algo 0 proto 0 time 25.200462
[0]     69.59    0.00    0.00      0
[0]            8             2     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0]     64.18    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8 Bytes -> Algo 0 proto 0 time 25.200924
[0]     59.38    0.00    0.00      0
[0]           16             4     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0]     60.01    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16 Bytes -> Algo 0 proto 0 time 25.201847
[0]     60.02    0.00    0.00      0
[0]           32             8     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0]     59.84    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32 Bytes -> Algo 0 proto 0 time 25.203691
[0]     59.86    0.00    0.00      0
[0]           64            16     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0]     60.43    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 64 Bytes -> Algo 0 proto 0 time 25.207382
[0]     60.06    0.00    0.00      0
[0]          128            32     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0]     61.58    0.00    0.00      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 128 Bytes -> Algo 0 proto 0 time 25.214764
[0]     60.77    0.00    0.00      0
[0]          256            64     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0]     72.81    0.00    0.01      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 256 Bytes -> Algo 0 proto 0 time 25.229528
[0]     62.05    0.00    0.01      0
[0]          512           128     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0]     92.22    0.01    0.01      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 512 Bytes -> Algo 0 proto 0 time 25.259054
[0]     63.54    0.01    0.02      0
[0]         1024           256     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0]     66.24    0.02    0.03      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1024 Bytes -> Algo 0 proto 0 time 25.331232
[0]     66.59    0.02    0.03      0
[0]         2048           512     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0]     73.09    0.03    0.05      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2048 Bytes -> Algo 0 proto 0 time 25.495272
[0]     72.49    0.03    0.05      0
[0]         4096          1024     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0]     75.82    0.05    0.10      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4096 Bytes -> Algo 0 proto 0 time 25.874907
[0]     75.38    0.05    0.10      0
[0]         8192          2048     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0]     153.3    0.05    0.10      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8192 Bytes -> Algo 0 proto 0 time 26.549810
[0]     169.9    0.05    0.09      0
[0]        16384          4096     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0]     249.8    0.07    0.13      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16384 Bytes -> Algo 0 proto 0 time 27.899622
[0]     190.3    0.09    0.16      0
[0]        32768          8192     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0]     288.6    0.11    0.22      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 32768 Bytes -> Algo 0 proto 0 time 30.599243
[0]     190.6    0.17    0.33      0
[0]        65536         16384     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0]     497.5    0.13    0.25      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 65536 Bytes -> Algo 0 proto 0 time 37.798233
[0]     345.6    0.19    0.36      0
[0]       131072         32768     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0]     402.5    0.33    0.62      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 131072 Bytes -> Algo 0 proto 1 time 51.603912
[0]     222.5    0.59    1.13      0
[0]       262144         65536     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0]     281.5    0.93    1.79      0h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 262144 Bytes -> Algo 0 proto 1 time 54.707825
[0]     230.3    1.14    2.18      0
[0]       524288        131072     float     sum      -1h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO NVLS comm 0x561097745b80 headRank 0 nHeads 8 buffSize 1048576 nvlsPerRankSize 33554432 nvlsTotalSize 268435456
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 01/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 02/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 03/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 04/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 05/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 06/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 07/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 09/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 10/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 11/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 12/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 13/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 14/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 15/0 : 16[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 01/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 03/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 05/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 07/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 09/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 11/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 13/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 15/0 : 0[0] -> 8[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 01/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 03/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 05/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 07/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 09/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 11/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 13/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 15/0 : 8[0] -> 0[0] [receive] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 01/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 02/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 03/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 04/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 05/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 06/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 07/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 09/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 10/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 11/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 12/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 13/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 14/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Channel 15/0 : 0[0] -> 16[0] [send] via NET/IBext_v8/7/GDRDMA
[0] h100clust-worker-1:497888:498098 [0] NCCL INFO Connected NVLS tree
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0]     380.2    1.38    2.64      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 524288 Bytes -> Algo 5 proto 2 time 59.553600
[0]     329.8    1.59    3.05      0
[0]      1048576        262144     float     sum      -1h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0]    1851.1    0.57    1.09      0h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1048576 Bytes -> Algo 5 proto 2 time 66.107201
[0]    2291.8    0.46    0.88      0
[0]      2097152        524288     float     sum      -1h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0]    2452.2    0.86    1.64      0h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2097152 Bytes -> Algo 5 proto 2 time 79.214401
[0]     393.8    5.33   10.21      0
[0]      4194304       1048576     float     sum      -1h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0]     621.9    6.74   12.93      0h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4194304 Bytes -> Algo 5 proto 2 time 105.428802
[0]     656.0    6.39   12.25      0
[0]      8388608       2097152     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0]    1008.0    8.32   15.95      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8388608 Bytes -> Algo 5 proto 2 time 157.857605
[0]    1001.5    8.38   16.05      0
[0]     16777216       4194304     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0]    1295.6   12.95   24.82      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 16777216 Bytes -> Algo 5 proto 2 time 262.715210
[0]    1321.1   12.70   24.34      0
[0]     33554432       8388608     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0]    2239.8   14.98   28.71      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 33554432 Bytes -> Algo 5 proto 2 time 472.430389
[0]    2249.3   14.92   28.59      0
[0]     67108864      16777216     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0]     24256    2.77    5.30      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 67108864 Bytes -> Algo 1 proto 1 time 842.177856
[0]     25233    2.66    5.10      0
[0]    134217728      33554432     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0]     48957    2.74    5.25      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 134217728 Bytes -> Algo 1 proto 1 time 1570.355713
[0]     46412    2.89    5.54      0
[0]    268435456      67108864     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0]    126387    2.12    4.07      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 268435456 Bytes -> Algo 1 proto 2 time 2999.454102
[0]    127194    2.11    4.05      0
[0]    536870912     134217728     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0]    196435    2.73    5.24      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 536870912 Bytes -> Algo 1 proto 2 time 5679.147949
[0]    189960    2.83    5.42      0
[0]   1073741824     268435456     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0]    373848    2.87    5.50      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 1073741824 Bytes -> Algo 1 proto 2 time 11038.536133
[0]    368024    2.92    5.59      0
[0]   2147483648     536870912     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0]    786561    2.73    5.23      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 2147483648 Bytes -> Algo 1 proto 2 time 21757.312500
[0]    799579    2.69    5.15      0
[0]   4294967296    1073741824     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0]   1549446    2.77    5.31      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 4294967296 Bytes -> Algo 1 proto 2 time 43194.867188
[0]   1575896    2.73    5.22      0
[0]   8589934592    2147483648     float     sum      -1[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0]   3063867    2.80    5.37      0[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO 8589934592 Bytes -> Algo 1 proto 2 time 86069.968750
[0]   3300068    2.60    4.99      0
[0] h100clust-worker-1:497888:497888 [0] NCCL INFO comm 0x561097745b80 rank 0 nranks 24 cudaDev 0 busId a4000 - Destroy COMPLETE
[0] # Out of bounds values : 0 OK
[0] # Avg bus bandwidth    : 4.01902
[0] #
[0]
kiskra-nvidia commented 1 month ago

Your latest log shows that NCCL chooses the Tree algorithm for small message sizes (<512KB), and then switches to NVLSTree-Simple up to 32MB, which is expected. Somewhat unusually (probably because of RoCE?), it switches to Ring-LL128 for 64MB-128MB, but for 256MB and above it switches to Ring-Simple, which is expected.

You may want to try experimenting by disabling Ring (NCCL_ALGO=^Ring) and/or enabling just NVLSTree (NCCL_ALGO=NVLSTree) but, since the log didn't reveal any fundamental problems with algorithm selection, I wouldn't be surprised if that doesn't solve it.

My guess is that you are suffering from some sort of network congestion. Have you tried experimenting with other values of NCCL_CROSS_NIC, specifically 0 and 1?

twichell commented 1 month ago

Thank you for your review of the data and suggestions. We've made some network infrastructure changes and are seeing improved performance. I'll get back after we've had more time to study the results.