NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Strange behavior when import torch after import te. #871

Open GGGGGGXY opened 1 month ago

GGGGGGXY commented 1 month ago

Firstly, I would like to express my sincere gratitude for your dedication and significant contributions to the open-source community. Your work has been instrumental and greatly appreciated.

However, while utilizing transformer_engine, I have encountered some issues that I am unable to resolve.

When import transformer_engine before torch, It cause a RUNTIME ERROR.

image image

In my codes. After import transformer_engine, it always teardorn with

image

transformer_engine v1.5 below working fine.

My env: h800 torch v2.3.0 cuda 12.4.1 cudnn 8.9.7.29 transformer_engine release_v1.7

Thank you in advance for taking the time to read this issue and for any help you can provide. I look forward to hearing from you soon.

ptrendx commented 1 month ago

Hmm, this is strange. @pggPL Could you take a look? You should be able to use H100 as a proxy for H800.