Open Tendo33 opened 1 year ago
5.3镜像里NCCL只能到96.3 ,
051,052,053,054,055,056,057,058 号机 8n8g 完整log
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
2147483648 536870912 float sum -1 25432 84.44 166.24 0 25383 84.60 166.56 0
4294967296 1073741824 float sum -1 53176 80.77 159.01 0 50261 85.45 168.24 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 165.013
051,052,053,054,055,056,057,058,059,060,061,062,063,026,027,028号机 16n8g 完整log
# out-of-place in-place
# size count type redop root time algbw busbw #wrong time algbw busbw #wrong
# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s)
2147483648 536870912 float sum -1 28415 75.58 149.97 0 32785 65.50 129.98 0
4294967296 1073741824 float sum -1 56926 75.45 149.72 0 57307 74.95 148.72 0
# Out of bounds values : 0 OK
# Avg bus bandwidth : 144.597
NVIDIA Graphics Device A800 80G | OneFlow_eb3df25 | Megatron_e156d2f |
---|---|---|
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs128_acc1_8n8g |
building graph cost time: 22.23s. / building plan cost time: 166.74s. / 66997-67197 Mib / 138.98 samples/s | 64754 - 64826 MiB / 132.6 samples/s |
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs512_acc4_8n8g |
building graph cost time: 28.06s. / building plan cost time: 167.93s. / 65547-65819 Mib / 162.54 samples/s | 64754 - 64826 MiB / 165.3 samples/s |
gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs256_acc1_16n8g |
building graph cost time: 27.79s. / building plan cost time: 394.68s. / 66999-67197 Mib / 249.03 samples/s | 64684 - 64828 MiB / 260.4 samples/s |
LibAI_gpt2_pretrain_graph_nl48_nah144_hs2304_FP16actrue DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs1024_acc4_16n8g |
building graph cost time: 35.72s. / building plan cost time: 365.7s. / 65549-65819 Mib / 312.32 samples/s | 64684 - 64828 MiB / 277.1 samples/s |
腾讯云 Libai 与 Megatron 关于 GPT2 的对比测试
GPT-2
loss_compara_content_sentence.idx
loss_compara_content_sentence.idx
测试环境
NCCL_TEST
028,029号机 2n8g 完整log
025,026,028,029号机 4n8g 完整log
051:8,052:8,053:8,054:8,055:8,056:8,057:8,058:8号机 8n8g 完整log
测试结果
DP16_MP1_PP1_zerofalse_stage2_mbs2_gbs32_acc1_2n8g
DP16_MP1_PP1_zerofalse_stage2_mbs2_gbs128_acc4_2n8g
DP32_MP1_PP1_zerofalse_stage2_mbs2_gbs64_acc1_4n8g
DP32_MP1_PP1_zerofalse_stage2_mbs2_gbs256_acc4_4n8g
DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs128_acc1_8n8g
DP64_MP1_PP1_zerofalse_stage2_mbs2_gbs512_acc4_8n8g
DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs256_acc1_16n8g
DP128_MP1_PP1_zerofalse_stage2_mbs2_gbs1024_acc4_16n8g