NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Value initialize all descriptors #910

Closed keshavb96 closed 3 weeks ago

keshavb96 commented 3 weeks ago

When descriptors are not value initialized the serialization in the HLO is different between different runs of the same computations because of variability in padding bits. This has implications for compilation caching.