Tendo33 / oneflow-test

oneflow test
0 stars 0 forks source link

VIT分离编译回归测试 #9

Open Tendo33 opened 1 year ago

Tendo33 commented 1 year ago
NVIDIA_GeForce_RTX_3080_Ti master + oneflow@6e019b7 + libai@d25f09c rank_per_proces + oneflow@a442869 + libai@d25f09c naive + oneflow@a442869 + libai@d25f09c6f
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_acfalse_dp1_mp4_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 11002 MiB / 219.67 samples/s 10994 MiB / 219.31 samples/s 10994 MiB / 239.39 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_acfalse_dp4_mp1_pp1_zerotrue_stage2_mbs64_gbs256_acc1_1n4g 7758 MiB / 899.03 samples/s 7742 MiB / 978.03 samples/s 吞吐偏高 7742 MiB / 905.66 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n1g 8017 MiB / 241.06 samples/s 8009 MiB / 255.8 samples/s 8009 MiB / 259.39 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp1_zerotrue_stage2_mbs256_gbs256_acc1_1n1g 6613 MiB / 309.55 samples/s 6605 MiB / 308.78 samples/s 6605 MiB / 308.95 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 8558 MiB / 234.22 samples/s 8550 MiB / 233.68 samples/s 8550 MiB / 231.84 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp1_pp4_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 5402 MiB / 275.0 samples/s 5394 MiB / 274.05 samples/s 5394 MiB / 254.43 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 8034 MiB / 219.82 samples/s 7936 MiB / 220.09 samples/s 7936 MiB / 221.11 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp2_pp2_zerotrue_stage2_mbs256_gbs256_acc1_1n4g 4912 MiB / 203.63 samples/s 4830 MiB / 204.88 samples/s 4830 MiB / 203.06 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp1_mp4_pp1_zerotrue_stage2_mbs128_gbs1024_acc8_1n4g 6470 MiB / 180.8 samples/s 6370 MiB / 178.49 samples/s 6370 MiB / 177.04 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 4008 MiB / 454.2 samples/s 吞吐偏低 3996 MiB / 502.64 samples/s 3996 MiB / 511.25 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp1_pp2_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 5640 MiB / 441.66 samples/s 5510 MiB / 443.4 samples/s 5510 MiB / 466.89 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs128_gbs256_acc1_1n4g 3828 MiB / 388.04 samples/s 3782 MiB / 393.13 samples/s 3782 MiB / 390.96 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp2_mp2_pp1_zerotrue_stage2_mbs64_gbs1024_acc8_1n4g 4728 MiB / 337.81 samples/s 4646 MiB / 330.97 samples/s 4646 MiB / 329.41 samples/s
libai_vit_imagenet_graph_nl12_nah12_hs768_fp16_actrue_dp4_mp1_pp1_zerotrue_stage2_mbs32_gbs1024_acc8_1n4g 3958 MiB / 772.74 samples/s 3892 MiB / 757.1 samples/s 3882 MiB / 738.88 samples/s