A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
It would be great if this library supported simulating FP8 on eg Ampere hardware, as you did in the FP8 whitepaper. I'm sure a lot of people are interested in seeing if their models will work well in FP8 before investing a lot of money in H100s, let alone the fact that they're barely available yet.
It would be great if this library supported simulating FP8 on eg Ampere hardware, as you did in the FP8 whitepaper. I'm sure a lot of people are interested in seeing if their models will work well in FP8 before investing a lot of money in H100s, let alone the fact that they're barely available yet.
I see https://github.com/IntelLabs/FP8-Emulation-Toolkit, but it's poorly documented and it's not clear if it implements the same tensor scaling algorithms that you have here.