NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Support simulating FP8 on older hardware #71

Open zplizzi opened 1 year ago

zplizzi commented 1 year ago

It would be great if this library supported simulating FP8 on eg Ampere hardware, as you did in the FP8 whitepaper. I'm sure a lot of people are interested in seeing if their models will work well in FP8 before investing a lot of money in H100s, let alone the fact that they're barely available yet.

I see https://github.com/IntelLabs/FP8-Emulation-Toolkit, but it's poorly documented and it's not clear if it implements the same tensor scaling algorithms that you have here.

alphaRGB commented 1 year ago

Have you ever integrated the "FP8-Emulation-Toolkit" into TransformerEngine, and run a simple network ?