Tendo33 / oneflow-test

oneflow test
0 stars 0 forks source link

LibAI_bert_large_pretrain_graph_nl24_nah16_hs1024_FP16_actrue_DP4_MP2_PP2_zerotrue_stage2_mbs32_gbs512_acc4_2n8g #1

Open Tendo33 opened 1 year ago

Tendo33 commented 1 year ago

case1

NVIDIA_GeForce_RTX_3080_Ti | master@b51cb72 | rank_per_process @a442869 | naive@a442869 -- | -- | -- | -- LibAI_bert_large_pretrain_graph nl24_nah16_hs1024_FP16_actrue DP4_MP2_PP2_zerotrue_stage2_ mbs32_gbs512_acc4_2n8g | building graph Done! Cost time: 22.11s. building plan Done! Cost time: 21.92s. node0:5774MIB–5924MIB node1:4262MIB–4621MIB [[master_output.log](https://oneflow-test.oss-cn-beijing.aliyuncs.com/OneAutoTest/onebench/libai/sunjinfeng_bert_test/case1/b51cb72_master1/LibAI_bert_large_pretrain_graph_nl24_nah16_hs1024_FP16_actrue_DP4_MP2_PP2_zerotrue_stage2_mbs32_gbs512_acc4_2n8g/output.log)] | building plan Done! Cost time: 23.21s. building graph Done! Cost time: 20.52 s. node0:5774MIB--5864MIB node1:4262MIB–4262MIB [[rank_per_process_output.log](https://oneflow-test.oss-cn-beijing.aliyuncs.com/OneAutoTest/onebench/libai/sunjinfeng_bert_test/case1/a442869_env_rank1/LibAI_bert_large_pretrain_graph_nl24_nah16_hs1024_FP16_actrue_DP4_MP2_PP2_zerotrue_stage2_mbs32_gbs512_acc4_2n8g/output.log)] 在跑这个case的时候lr为N/A,详见output.log,但是单机测试时lr正常显示| building plan Done! Cost time: 21.77s. building graph Done! Cost time: 23.8s. node0:5736MIB--5886MIB node1:4242MIB–4262MIB [[naive_output.log](https://oneflow-test.oss-cn-beijing.aliyuncs.com/OneAutoTest/onebench/libai/sunjinfeng_bert_test/case1/a442869_naive/LibAI_bert_large_pretrain_graph_nl24_nah16_hs1024_FP16_actrue_DP4_MP2_PP2_zerotrue_stage2_mbs32_gbs512_acc4_2n8g/output.log)]

LibAI_bert_large_pretrain_graph_nl24_nah16_hs1024_FP16_actrue_DP4_MP2_PP2_zerotrue_stage2_mbs32_gbs512_acc4_2n8g _100-220

Tendo33 commented 1 year ago

node1 也就是26号服务器的显存有误差,因为同时有其他人在使用。