A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
Firstly, I would like to express my sincere gratitude for your dedication and significant contributions to the open-source community. Your work has been instrumental and greatly appreciated.
However, while utilizing transformer_engine, I have encountered some issues that I am unable to resolve.
When import transformer_engine before torch, It cause a RUNTIME ERROR.
In my codes. After import transformer_engine, it always teardorn with
transformer_engine v1.5 below working fine.
My env:
h800
torch v2.3.0
cuda 12.4.1
cudnn 8.9.7.29
transformer_engine release_v1.7
Thank you in advance for taking the time to read this issue and for any help you can provide. I look forward to hearing from you soon.
Firstly, I would like to express my sincere gratitude for your dedication and significant contributions to the open-source community. Your work has been instrumental and greatly appreciated.
However, while utilizing transformer_engine, I have encountered some issues that I am unable to resolve.
When import transformer_engine before torch, It cause a RUNTIME ERROR.
In my codes. After import transformer_engine, it always teardorn with
transformer_engine v1.5 below working fine.
My env: h800 torch v2.3.0 cuda 12.4.1 cudnn 8.9.7.29 transformer_engine release_v1.7
Thank you in advance for taking the time to read this issue and for any help you can provide. I look forward to hearing from you soon.