NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

ERROR: Failed building wheel for transformer-engine #857

Closed Weifan1226 closed 1 month ago

Weifan1226 commented 1 month ago

Hello mates!

I had a problem while installing transformer-engine wheels:

I have used two ways to install the engine: but all failed. 1:pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable 2:git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git cd TransformerEngine export NVTE_FRAMEWORK=pytorch
git submodule update --init --recursive pip install .

Hope to get your help Thanks!


“ Please use int64_t instead” Is that the problem really happen?

/root/miniconda3/lib/python3.8/site-packages/torch/include/ATen/core/boxing/impl/make_boxed_from_unboxed_functor.h:203:37: error: static assertion failed: You tried to register a kernel with an unsupported integral input type. Please use int64_t instead. 203 | static_assert(guts::false_t::value, | ^~~~~ error: command 'gcc' failed with exit status 1

ERROR: Failed building wheel for transformer-engine


Terminal output: It seems successfully to buiild 截屏2024-05-20 18 50 59 截屏2024-05-20 18 51 15

But it faild: 截屏2024-05-20 18 51 51 截屏2024-05-20 18 52 25

ptrendx commented 1 month ago

Hi, which pyTorch version are you using?

Weifan1226 commented 1 month ago

Hi, which pyTorch version are you using?

Hi, Thanks for respones.

I just find out TE is for Hopper, but I was using 4090. I changed Megatron to V2.5 (on cuda11.1 V100) which is not using TE. And perfectly fix the problem.

Thanks

timmoon10 commented 1 month ago

I'm glad you found a solution that worked for you. I'll just comment for future reference.

TE should work on a 4090 since Lovelace has FP8 support. This is a compilation error when TE builds some PyTorch extensions, which implies a problem with the build environment rather than with the GPU. We tend to develop using fairly recent versions of PyTorch. Looking at the line numbers in your error message, it seems you are using PyTorch 2.0.0 or 2.0.1, which are over a year old.

Weifan1226 commented 1 month ago

I'm glad you found a solution that worked for you. I'll just comment for future reference.

TE should work on a 4090 since Lovelace has FP8 support. This is a compilation error when TE builds some PyTorch extensions, which implies a problem with the build environment rather than with the GPU. We tend to develop using fairly recent versions of PyTorch. Looking at the line numbers in your error message, it seems you are using PyTorch 2.0.0 or 2.0.1, which are over a year old.

Hi!

It is true that TE is able to work on 4090. I was using Pytorch 2.0.0 Cuda 11.8 unbuntu20.04. And today I change it into Pytorch 2.1.0, Cuda12.1,Ubuntu22.04. And TE and extensions built perferctly! I am so happy to it works well.

Thanks for your help!