[QUESTION] scaleing MFU calculate

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Other

10.62k stars 2.38k forks source link

Open ltm920716 opened 2 weeks ago

ltm920716 commented 2 weeks ago

Your question Hello， the MFU in the table is almost 40%，and the teraflops is around 400，like bellow： https://github.com/NVIDIA/Megatron-LM?tab=readme-ov-file#training-speed-and-scalability

that is this table use fp32 float？but there is all --fp16 as param in the training script，if use fp16，the MFU should be 400 / 2000 = 20%， not 40%？

I am so confused， help please，thanks！