issues
search
Tendo33
/
oneflow-test
oneflow test
0
stars
0
forks
source link
Libai Megatron GPT测试
#12
Open
Tendo33
opened
1 year ago
Tendo33
commented
1 year ago
GPT-2
libai
Megatron
数据集
loss_compara_content_sentence.bin
loss_compara_content_sentence.idx
loss_compara_content_sentence.bin
loss_compara_content_sentence.idx
vocab.txt
bert-base-chinese-vocab.txt
bert-base-chinese-vocab.txt
测试脚本
args_train.sh
megatron_args_pretrain_gpt2.sh
测试环境
OneFlow
Libai
Megatron
(master分支)
9f08133
(main分支)
247cbb7
(mian分支)
e156d2f
libai开启nccl_use_compute_stream
测试结果
测试了三组,一组纯数据并行,一组混合并行,一组纯模型并行
NVIDIA_GeForce_RTX_3090
Libai
Megatron
gpt2_nl24_nah16_hs768_FP16_acfalse_DP8_MP1_PP1_zerofalse_stage2_mbs4_gbs32_acc1_1n8g
16514–16568 MiB
/
112.17 samples/s
[16931 MiB] /
84.7 samples/s
gpt2_nl24_nah16_hs1024_FP16_acfalse_DP8_MP1_PP1_zerofalse_stage2_mbs8_gbs64_acc1_1n8g
OOM
OOM
gpt2_nl24_nah16_hs768_FP16_acfalse_DP2_MP2_PP2_zerofalse_stage2_mbs4_gbs16_acc2_1n8g
16066–16196 MiB
/
37.44 samples/s
[8187 MiB] /
45.8 samples/s
gpt2_nl24_nah16_hs1024_FP16_acfalse_DP2_MP2_PP2_zerofalse_stage2_mbs8_gbs16_acc1_1n8g
7987–10258 MiB
/
22.40 samples/s
[9317 MiB] /
27.7 samples/s
gpt2_nl24_nah16_hs768_FP16_acfalse_DP1_MP8_PP1_zerofalse_stage2_mbs32_gbs256_acc8_1n8g
18456–18456 MiB
/
14.94 samples/s
[23759 MiB] /
14.4 samples/s
gpt2_graph_nl24_nah16_hs1024__acfalse_DP_MP2_PP2_zerofalse_stage2_mbs8_gbs32_acc_1n8g
OOM
[11057MiB] /
35.9 samples/s
gpt2_eager_nl24_nah16_hs768__acfalse_DP_MP2_PP2_zerofalse_stage2_mbs8_gbs64_acc_1n8g
OOM
[14248MiB] /
52.8 samples/s
GPT-2
loss_compara_content_sentence.idx
loss_compara_content_sentence.idx
测试环境
测试结果
测试了三组,一组纯数据并行,一组混合并行,一组纯模型并行