A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
When descriptors are not value initialized the serialization in the HLO is different between different runs of the same computations because of variability in padding bits. This has implications for compilation caching.
When descriptors are not value initialized the serialization in the HLO is different between different runs of the same computations because of variability in padding bits. This has implications for compilation caching.