Tendo33 / oneflow-test

oneflow test
0 stars 0 forks source link

Libai Megatron GPT测试 #12

Open Tendo33 opened 1 year ago

Tendo33 commented 1 year ago
NVIDIA_GeForce_RTX_3090 Libai Megatron
gpt2_nl24_nah16_hs768_FP16_acfalse_DP8_MP1_PP1_zerofalse_stage2_mbs4_gbs32_acc1_1n8g 16514–16568 MiB / 112.17 samples/s [16931 MiB] / 84.7 samples/s
gpt2_nl24_nah16_hs1024_FP16_acfalse_DP8_MP1_PP1_zerofalse_stage2_mbs8_gbs64_acc1_1n8g OOM OOM
gpt2_nl24_nah16_hs768_FP16_acfalse_DP2_MP2_PP2_zerofalse_stage2_mbs4_gbs16_acc2_1n8g 16066–16196 MiB / 37.44 samples/s [8187 MiB] / 45.8 samples/s
gpt2_nl24_nah16_hs1024_FP16_acfalse_DP2_MP2_PP2_zerofalse_stage2_mbs8_gbs16_acc1_1n8g 7987–10258 MiB / 22.40 samples/s [9317 MiB] / 27.7 samples/s
gpt2_nl24_nah16_hs768_FP16_acfalse_DP1_MP8_PP1_zerofalse_stage2_mbs32_gbs256_acc8_1n8g 18456–18456 MiB / 14.94 samples/s [23759 MiB] / 14.4 samples/s
gpt2_graph_nl24_nah16_hs1024__acfalse_DP_MP2_PP2_zerofalse_stage2_mbs8_gbs32_acc_1n8g OOM [11057MiB] / 35.9 samples/s
gpt2_eager_nl24_nah16_hs768__acfalse_DP_MP2_PP2_zerofalse_stage2_mbs8_gbs64_acc_1n8g OOM [14248MiB] / 52.8 samples/s