Tendo33 / oneflow-test

oneflow test
0 stars 0 forks source link

腾讯云 libai_gpt 与 megatron_gpt 对比测试 #14

Open Tendo33 opened 1 year ago

Tendo33 commented 1 year ago

腾讯云 Libai 与 Megatron 关于 GPT2 的对比测试

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
a800-028:49082:49082 [0] NCCL INFO Launch mode Parallel
  2147483648     536870912     float     sum      -1    27260   78.78  147.71      0    28329   75.81  142.14      0
  4294967296    1073741824     float     sum      -1    53872   79.73  149.49      0    53867   79.73  149.50      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 147.207 

025,026,028,029号机 4n8g 完整log

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
a800-028:48888:48998 [1] NCCL INFO comm 0x7fe49c000fa0 rank 1 nranks 32 cudaDev 1 busId 24000 - Init COMPLETE
a800-028:48887:48887 [0] NCCL INFO Launch mode Parallel
  2147483648     536870912     float     sum      -1    43516   49.35   95.61      0    43263   49.64   96.17      0
  4294967296    1073741824     float     sum      -1    91891   46.74   90.56      0    94436   45.48   88.12      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 92.616 

051:8,052:8,053:8,054:8,055:8,056:8,057:8,058:8号机 8n8g 完整log

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
  2147483648     536870912     float     sum      -1    25432   84.44  166.24      0    25383   84.60  166.56      0
  4294967296    1073741824     float     sum      -1    53176   80.77  159.01      0    50261   85.45  168.24      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 165.013 
Tendo33 commented 1 year ago
Tendo33 commented 1 year ago

node 0:nsys

node 1:nsys

Tendo33 commented 1 year ago

image 5.3镜像里NCCL只能到96.3 ,

Tendo33 commented 1 year ago

腾讯云 A800 GPT 8卡及16卡测试

051,052,053,054,055,056,057,058 号机 8n8g 完整log

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
  2147483648     536870912     float     sum      -1    25432   84.44  166.24      0    25383   84.60  166.56      0
  4294967296    1073741824     float     sum      -1    53176   80.77  159.01      0    50261   85.45  168.24      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 165.013 

051,052,053,054,055,056,057,058,059,060,061,062,063,026,027,028号机 16n8g 完整log

#                                                              out-of-place                       in-place          
#       size         count      type   redop    root     time   algbw   busbw #wrong     time   algbw   busbw #wrong
#        (B)    (elements)                               (us)  (GB/s)  (GB/s)            (us)  (GB/s)  (GB/s)       
   2147483648     536870912     float     sum      -1    28415   75.58  149.97      0    32785   65.50  129.98      0
  4294967296    1073741824     float     sum      -1    56926   75.45  149.72      0    57307   74.95  148.72      0
# Out of bounds values : 0 OK
# Avg bus bandwidth    : 144.597 
NVIDIA Graphics Device A800 80G OneFlow_eb3df25 Megatron_e156d2f
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue
DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs128_acc1_8n8g
building graph cost time: 22.23s. / building plan cost time: 166.74s. / 66997-67197 Mib / 138.98 samples/s 64754 - 64826 MiB / 132.6 samples/s
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue
DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs512_acc4_8n8g
building graph cost time: 28.06s. / building plan cost time: 167.93s. / 65547-65819 Mib / 162.54 samples/s 64754 - 64826 MiB / 165.3 samples/s
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue
DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs256_acc1_16n8g
building graph cost time: 27.79s. / building plan cost time: 394.68s. / 66999-67197 Mib / 249.03 samples/s 64684 - 64828 MiB / 260.4 samples/s
LibAI_gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue
DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs1024_acc4_16n8g
building graph cost time: 35.72s. / building plan cost time: 365.7s. / 65549-65819 Mib / 312.32 samples/s 64684 - 64828 MiB / 277.1 samples/s