A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
The port flap may cause hang when using TE in training job.
I found the latest code added CE deadlock detection. BTW, What is CE deadlock?
How can we avoid this kind of hang? Thanks.
The port flap may cause hang when using TE in training job. I found the latest code added CE deadlock detection. BTW, What is CE deadlock? How can we avoid this kind of hang? Thanks.