When H800 is trained with FP8, the performance is not significantly improved compared to FP16, and is even worse than FP16.

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

https://docs.nvidia.com/megatron-core/developer-guide/latest/user-guide/index.html#quick-start

Other

9.23k stars 2.08k forks source link

When H800 is trained with FP8, the performance is not significantly improved compared to FP16, and is even worse than FP16. #860

Closed yangzhipeng1108 closed 1 week ago

yangzhipeng1108 commented 3 weeks ago

Your question Ask a clear and concise question about Megatron-LM.

LiaoYuanF commented 1 week ago

hello，Please tell me, do you know how to start fp8 training? thanks

yangzhipeng1108 commented 1 week ago

    --bf16 \
    --fp8-format hybrid \
    --fp8-amax-compute-algo max \
    --fp8-amax-history-len 1024 \
    --transformer-impl transformer_engine

LiaoYuanF commented 1 week ago

    --bf16 \
    --fp8-format hybrid \
    --fp8-amax-compute-algo max \
    --fp8-amax-history-len 1024 \
    --transformer-impl transformer_engine

thanks，I occur the same performance degradation problem as you on H20 GPU, please check your transformer engine version, upgrading to the new version may solve your problem