NVIDIA / nccl-tests

NCCL Tests
BSD 3-Clause "New" or "Revised" License
876 stars 238 forks source link

Evaluation of NCCL test result #119

Closed Yujaeseo closed 1 year ago

Yujaeseo commented 1 year ago

I ran the NCCL test(allreduce) to evaluate the performance of the GPU server. I want to know if the test result is appropriate considering the hardware specs of servers. I ran the code on the cluster which consists of 3x servers, and each server has 2x Intel Xeon Platinum 8358, 8x NVIDIA A100 SXM4 40G GPU, and 4x Mellanox HDR Infiniband cards.

I think the theoretical performance is 100GB/s (200G x 4 / 8) and the test result shows the average performance is about 84.8GB/s and peak performance reaches about 96GB/s.

Are the results reasonable considering the hardware specifications? Are there any additional optimization methods to apply?

The mpirun command and test result are as follows.

mpirun -mca pml ucx -x OMPI_MCA_btl=^openib -np 24  --hostfile ./hostfile ./build/all_reduce_perf -b 8 -e 128M -g 1

# nThread 1 nGpus 1 minBytes 8 maxBytes 134217728 step: 1048576(bytes) warmup iters: 5 iters: 20 agg iters: 1 validation: 1 graph: 0
#
# Using devices
#  Rank  0 Group  0 Pid 3621216 on pod01 device  0 [0x10] NVIDIA A100-SXM4-40GB
#  Rank  1 Group  0 Pid 3621217 on pod01 device  1 [0x16] NVIDIA A100-SXM4-40GB
#  Rank  2 Group  0 Pid 3621218 on pod01 device  2 [0x49] NVIDIA A100-SXM4-40GB
#  Rank  3 Group  0 Pid 3621219 on pod01 device  3 [0x4d] NVIDIA A100-SXM4-40GB
#  Rank  4 Group  0 Pid 3621220 on pod01 device  4 [0x8a] NVIDIA A100-SXM4-40GB
#  Rank  5 Group  0 Pid 3621221 on pod01 device  5 [0x8f] NVIDIA A100-SXM4-40GB
#  Rank  6 Group  0 Pid 3621222 on pod01 device  6 [0xc6] NVIDIA A100-SXM4-40GB
#  Rank  7 Group  0 Pid 3621224 on pod01 device  7 [0xca] NVIDIA A100-SXM4-40GB
#  Rank  8 Group  0 Pid 1806861 on pod03 device  0 [0x10] NVIDIA A100-SXM4-40GB
#  Rank  9 Group  0 Pid 1806862 on pod03 device  1 [0x16] NVIDIA A100-SXM4-40GB
#  Rank 10 Group  0 Pid 1806863 on pod03 device  2 [0x49] NVIDIA A100-SXM4-40GB
#  Rank 11 Group  0 Pid 1806864 on pod03 device  3 [0x4d] NVIDIA A100-SXM4-40GB
#  Rank 12 Group  0 Pid 1806865 on pod03 device  4 [0x8a] NVIDIA A100-SXM4-40GB
#  Rank 13 Group  0 Pid 1806866 on pod03 device  5 [0x8f] NVIDIA A100-SXM4-40GB
#  Rank 14 Group  0 Pid 1806868 on pod03 device  6 [0xc6] NVIDIA A100-SXM4-40GB
#  Rank 15 Group  0 Pid 1806870 on pod03 device  7 [0xca] NVIDIA A100-SXM4-40GB
#  Rank 16 Group  0 Pid 2260659 on pod10 device  0 [0x10] NVIDIA A100-SXM4-40GB
#  Rank 17 Group  0 Pid 2260660 on pod10 device  1 [0x16] NVIDIA A100-SXM4-40GB
#  Rank 18 Group  0 Pid 2260661 on pod10 device  2 [0x49] NVIDIA A100-SXM4-40GB
#  Rank 19 Group  0 Pid 2260662 on pod10 device  3 [0x4d] NVIDIA A100-SXM4-40GB
#  Rank 20 Group  0 Pid 2260663 on pod10 device  4 [0x8a] NVIDIA A100-SXM4-40GB
#  Rank 21 Group  0 Pid 2260664 on pod10 device  5 [0x8f] NVIDIA A100-SXM4-40GB
#  Rank 22 Group  0 Pid 2260665 on pod10 device  6 [0xc6] NVIDIA A100-SXM4-40GB
#  Rank 23 Group  0 Pid 2260666 on pod10 device  7 [0xca] NVIDIA A100-SXM4-40GB
#
#                                                              out-of-place                       in-place
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)
           8             2     float     sum      -1    59.00    0.00    0.00      0    87.53    0.00    0.00      0
     1048584        262146     float     sum      -1    127.3    8.24   15.79      0    127.2    8.24   15.80      0
     2097160        524290     float     sum      -1    187.6   11.18   21.43      0    180.6   11.61   22.26      0
     3145736        786434     float     sum      -1    193.2   16.28   31.21      0    192.0   16.39   31.41      0
     4194312       1048578     float     sum      -1    225.7   18.58   35.62      0    223.3   18.79   36.01      0
     5242888       1310722     float     sum      -1    249.1   21.05   40.35      0    235.4   22.27   42.69      0
     6291464       1572866     float     sum      -1    248.0   25.36   48.62      0    247.3   25.44   48.75      0
     7340040       1835010     float     sum      -1    260.0   28.23   54.10      0    257.5   28.51   54.63      0
     8388616       2097154     float     sum      -1    276.2   30.38   58.22      0    273.6   30.66   58.77      0
     9437192       2359298     float     sum      -1    306.8   30.76   58.95      0    302.0   31.25   59.89      0
    10485768       2621442     float     sum      -1    307.1   34.14   65.44      0    304.5   34.44   66.00      0
    11534344       2883586     float     sum      -1    313.6   36.77   70.49      0    311.1   37.08   71.07      0
    12582920       3145730     float     sum      -1    329.4   38.20   73.21      0    326.4   38.56   73.90      0
    13631496       3407874     float     sum      -1    351.3   38.80   74.37      0    346.6   39.33   75.39      0
    14680072       3670018     float     sum      -1    366.7   40.03   76.72      0    366.0   40.11   76.88      0
    15728648       3932162     float     sum      -1    403.4   38.99   74.73      0    392.5   40.08   76.81      0
    16777224       4194306     float     sum      -1    420.2   39.93   76.53      0    394.9   42.49   81.43      0
    17825800       4456450     float     sum      -1    421.0   42.35   81.16      0    419.0   42.54   81.54      0
    18874376       4718594     float     sum      -1    445.6   42.36   81.19      0    443.3   42.58   81.61      0
    19922952       4980738     float     sum      -1    460.3   43.28   82.95      0    460.2   43.29   82.97      0
    20971528       5242882     float     sum      -1    483.9   43.34   83.07      0    479.0   43.79   83.92      0
    22020104       5505026     float     sum      -1    501.2   43.94   84.21      0    502.4   43.83   84.01      0
    23068680       5767170     float     sum      -1    518.4   44.50   85.30      0    521.1   44.27   84.84      0
    24117256       6029314     float     sum      -1    535.3   45.06   86.36      0    534.2   45.15   86.53      0
    25165832       6291458     float     sum      -1    557.4   45.15   86.53      0    553.2   45.49   87.20      0
    26214408       6553602     float     sum      -1    581.4   45.09   86.42      0    578.4   45.32   86.87      0
    27262984       6815746     float     sum      -1    598.5   45.55   87.31      0    604.6   45.09   86.43      0
    28311560       7077890     float     sum      -1    622.4   45.49   87.18      0    622.6   45.47   87.16      0
    29360136       7340034     float     sum      -1    641.8   45.75   87.68      0    643.5   45.62   87.45      0
    30408712       7602178     float     sum      -1    672.5   45.21   86.66      0    674.1   45.11   86.46      0
    31457288       7864322     float     sum      -1    689.2   45.65   87.49      0    684.5   45.95   88.08      0
    32505864       8126466     float     sum      -1    763.5   42.57   81.60      0    717.7   45.29   86.81      0
    33554440       8388610     float     sum      -1    732.9   45.79   87.75      0    730.9   45.91   87.99      0
    34603016       8650754     float     sum      -1    768.4   45.04   86.32      0    774.4   44.68   85.64      0
    35651592       8912898     float     sum      -1    787.8   45.26   86.74      0    787.9   45.25   86.73      0
    36700168       9175042     float     sum      -1    802.2   45.75   87.68      0    803.0   45.70   87.60      0
    37748744       9437186     float     sum      -1    836.1   45.15   86.53      0    809.3   46.65   89.40      0
    38797320       9699330     float     sum      -1    859.0   45.16   86.56      0    888.9   43.64   83.65      0
    39845896       9961474     float     sum      -1    882.0   45.17   86.58      0    890.7   44.74   85.74      0
    40894472      10223618     float     sum      -1    945.6   43.25   82.89      0    898.8   45.50   87.21      0
    41943048      10485762     float     sum      -1    910.2   46.08   88.32      0    911.7   46.00   88.17      0
    42991624      10747906     float     sum      -1    939.7   45.75   87.69      0    935.0   45.98   88.13      0
    44040200      11010050     float     sum      -1    973.3   45.25   86.73      0    960.4   45.85   87.89      0
    45088776      11272194     float     sum      -1   1007.3   44.76   85.79      0    994.3   45.35   86.92      0
    46137352      11534338     float     sum      -1   1000.8   46.10   88.36      0    999.4   46.16   88.48      0
    47185928      11796482     float     sum      -1   1059.4   44.54   85.37      0   1029.3   45.84   87.86      0
    48234504      12058626     float     sum      -1   1057.1   45.63   87.45      0   1067.3   45.19   86.62      0
    49283080      12320770     float     sum      -1   1084.7   45.43   87.08      0   1094.6   45.03   86.30      0
    50331656      12582914     float     sum      -1   1161.5   43.33   83.05      0   1114.5   45.16   86.56      0
    51380232      12845058     float     sum      -1   1119.9   45.88   87.93      0   1136.8   45.20   86.63      0
    52428808      13107202     float     sum      -1   1149.8   45.60   87.39      0   1184.3   44.27   84.85      0
    53477384      13369346     float     sum      -1   1172.0   45.63   87.45      0   1216.9   43.95   84.23      0
    54525960      13631490     float     sum      -1   1205.7   45.22   86.68      0   1215.0   44.88   86.01      0
    55574536      13893634     float     sum      -1   1215.7   45.71   87.62      0   1221.8   45.48   87.18      0
    56623112      14155778     float     sum      -1   1247.2   45.40   87.02      0   1308.4   43.28   82.94      0
    57671688      14417922     float     sum      -1   1281.7   45.00   86.25      0   1288.0   44.78   85.82      0
    58720264      14680066     float     sum      -1   1303.6   45.05   86.34      0   1305.1   44.99   86.24      0
    59768840      14942210     float     sum      -1   1313.1   45.52   87.24      0   1330.4   44.93   86.11      0
    60817416      15204354     float     sum      -1   1339.7   45.40   87.01      0   1343.3   45.27   86.78      0
    61865992      15466498     float     sum      -1   1368.1   45.22   86.67      0   1395.1   44.35   84.99      0
    62914568      15728642     float     sum      -1   1393.9   45.14   86.51      0   1371.1   45.88   87.95      0
    63963144      15990786     float     sum      -1   1408.8   45.40   87.02      0   1401.7   45.63   87.46      0
    65011720      16252930     float     sum      -1   1449.6   44.85   85.96      0   1442.8   45.06   86.36      0
    66060296      16515074     float     sum      -1   1472.1   44.87   86.01      0   1440.8   45.85   87.88      0
    67108872      16777218     float     sum      -1   1525.4   43.99   84.32      0   1475.8   45.47   87.16      0
    68157448      17039362     float     sum      -1   1506.1   45.25   86.74      0   1528.7   44.58   85.45      0
    69206024      17301506     float     sum      -1   1600.1   43.25   82.90      0   1554.8   44.51   85.31      0
    70254600      17563650     float     sum      -1   1541.9   45.56   87.33      0   1554.3   45.20   86.63      0
    71303176      17825794     float     sum      -1   1580.7   45.11   86.46      0   1581.8   45.08   86.40      0
    72351752      18087938     float     sum      -1   1623.7   44.56   85.40      0   1635.5   44.24   84.79      0
    73400328      18350082     float     sum      -1   1617.8   45.37   86.96      0   1614.3   45.47   87.15      0
    74448904      18612226     float     sum      -1   1639.4   45.41   87.04      0   1644.3   45.28   86.78      0
    75497480      18874370     float     sum      -1   1678.7   44.97   86.20      0   1680.3   44.93   86.12      0
    76546056      19136514     float     sum      -1   1693.9   45.19   86.61      0   1691.0   45.27   86.76      0
    77594632      19398658     float     sum      -1   1722.2   45.06   86.36      0   1709.5   45.39   87.00      0
    78643208      19660802     float     sum      -1   1732.4   45.40   87.01      0   1766.8   44.51   85.31      0
    79691784      19922946     float     sum      -1   1779.4   44.78   85.84      0   1783.4   44.68   85.65      0
    80740360      20185090     float     sum      -1   1805.5   44.72   85.71      0   1784.5   45.25   86.72      0
    81788936      20447234     float     sum      -1   1818.3   44.98   86.22      0   1823.0   44.87   85.99      0
    82837512      20709378     float     sum      -1   1847.4   44.84   85.94      0   1898.1   43.64   83.65      0
    83886088      20971522     float     sum      -1   1876.1   44.71   85.70      0   1872.2   44.81   85.88      0
    84934664      21233666     float     sum      -1   1903.9   44.61   85.50      0   1883.4   45.10   86.43      0
    85983240      21495810     float     sum      -1   1757.8   48.92   93.76      0   1770.2   48.57   93.10      0
    87031816      21757954     float     sum      -1   1781.5   48.85   93.64      0   1776.2   49.00   93.91      0
    88080392      22020098     float     sum      -1   1794.2   49.09   94.09      0   1788.0   49.26   94.42      0
    89128968      22282242     float     sum      -1   1829.6   48.71   93.37      0   1799.6   49.53   94.93      0
    90177544      22544386     float     sum      -1   1842.5   48.94   93.81      0   1835.3   49.14   94.18      0
    91226120      22806530     float     sum      -1   1857.0   49.12   94.16      0   1845.2   49.44   94.76      0
    92274696      23068674     float     sum      -1   1853.1   49.80   95.44      0   1861.8   49.56   94.99      0
    93323272      23330818     float     sum      -1   1906.3   48.95   93.83      0   1873.2   49.82   95.49      0
    94371848      23592962     float     sum      -1   1956.3   48.24   92.46      0   1920.7   49.13   94.17      0
    95420424      23855106     float     sum      -1   1937.9   49.24   94.37      0   1945.4   49.05   94.01      0
    96469000      24117250     float     sum      -1   1958.8   49.25   94.39      0   1958.8   49.25   94.39      0
    97517576      24379394     float     sum      -1   1980.6   49.24   94.37      0   2018.0   48.32   92.62      0
    98566152      24641538     float     sum      -1   1992.2   49.48   94.83      0   1995.6   49.39   94.67      0
    99614728      24903682     float     sum      -1   1994.5   49.95   95.73      0   1995.2   49.93   95.69      0
   100663304      25165826     float     sum      -1   2022.9   49.76   95.38      0   2027.5   49.65   95.16      0
   101711880      25427970     float     sum      -1   2038.6   49.89   95.63      0   2042.1   49.81   95.46      0
   102760456      25690114     float     sum      -1   2069.4   49.66   95.18      0   2058.5   49.92   95.68      0
   103809032      25952258     float     sum      -1   2068.6   50.18   96.19      0   2102.7   49.37   94.62      0
   104857608      26214402     float     sum      -1   2094.2   50.07   95.97      0   2091.7   50.13   96.08      0
   105906184      26476546     float     sum      -1   2148.6   49.29   94.47      0   2153.3   49.18   94.27      0
   106954760      26738690     float     sum      -1   2406.0   44.45   85.20      0   2165.0   49.40   94.68      0
   108003336      27000834     float     sum      -1   2163.5   49.92   95.68      0   2191.4   49.29   94.46      0
   109051912      27262978     float     sum      -1   2195.0   49.68   95.22      0   2186.0   49.89   95.61      0
   110100488      27525122     float     sum      -1   2190.3   50.27   96.35      0   2198.3   50.08   96.00      0
   111149064      27787266     float     sum      -1   2218.9   50.09   96.01      0   2209.0   50.32   96.44      0
   112197640      28049410     float     sum      -1   2238.2   50.13   96.08      0   2234.8   50.21   96.23      0
   113246216      28311554     float     sum      -1   2267.3   49.95   95.73      0   2247.2   50.39   96.59      0
   114294792      28573698     float     sum      -1   2281.2   50.10   96.03      0   2272.3   50.30   96.41      0
   115343368      28835842     float     sum      -1   2320.5   49.71   95.27      0   2298.9   50.17   96.17      0
   116391944      29097986     float     sum      -1   2342.6   49.69   95.23      0   2333.8   49.87   95.59      0
   117440520      29360130     float     sum      -1   2363.3   49.69   95.25      0   2354.3   49.88   95.61      0
   118489096      29622274     float     sum      -1   2381.3   49.76   95.37      0   2381.3   49.76   95.37      0
   119537672      29884418     float     sum      -1   2416.9   49.46   94.80      0   2389.2   50.03   95.90      0
   120586248      30146562     float     sum      -1   2400.5   50.23   96.28      0   2412.7   49.98   95.79      0
   121634824      30408706     float     sum      -1   2425.5   50.15   96.12      0   2432.1   50.01   95.86      0
   122683400      30670850     float     sum      -1   2458.7   49.90   95.64      0   2456.1   49.95   95.74      0
   123731976      30932994     float     sum      -1   2507.5   49.35   94.58      0   2469.7   50.10   96.02      0
   124780552      31195138     float     sum      -1   2492.7   50.06   95.94      0   2485.2   50.21   96.23      0
   125829128      31457282     float     sum      -1   2541.0   49.52   94.91      0   2513.1   50.07   95.96      0
   126877704      31719426     float     sum      -1   2532.2   50.11   96.04      0   2529.6   50.16   96.13      0
   127926280      31981570     float     sum      -1   2560.7   49.96   95.75      0   2556.2   50.05   95.92      0
   128974856      32243714     float     sum      -1   2583.9   49.92   95.67      0   2562.4   50.33   96.47      0
   130023432      32505858     float     sum      -1   2584.7   50.31   96.42      0   2587.1   50.26   96.33      0
   131072008      32768002     float     sum      -1   2604.6   50.32   96.45      0   2630.6   49.83   95.50      0
   132120584      33030146     float     sum      -1   2644.1   49.97   95.77      0   2619.1   50.45   96.69      0
   133169160      33292290     float     sum      -1   2666.0   49.95   95.74      0   2660.4   50.06   95.94      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 84.8558

I look forward to answer! Thank you.

sjeaugey commented 1 year ago

We usually consider 24GB/s is the peak performance per 200G NIC (12 GB/s for 100G NICs), so in your case 96GB/s is the target, which you seem to reach, so from my perspective it's perfect.

Yujaeseo commented 1 year ago

Thank you for your reply!