NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Failed to build Transformer Engine #881

Closed zirui closed 1 month ago

zirui commented 1 month ago

Hi, I am encountering an issue while attempting to install Transformer Engine from the source code.
However, the installation failed with the following error messages:

        /code/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp:498:8:   required from here
        /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/op_registration/infer_schema.h:42:7: error: static assertion failed: INVALID TYPE: Only int64_t and bool are supported as an integral argument type
           42 |    >::value, "INVALID TYPE: Only int64_t and bool are supported as an integral argument type");

        /code/TransformerEngine/transformer_engine/pytorch/csrc/ts_fp8_op.cpp:501:8:   required from here
        /usr/local/lib/python3.10/dist-packages/torch/include/ATen/core/op_registration/infer_schema.h:42:7: error: static assertion failed: INVALID TYPE: Only int64_t and bool are supported as an integral argument type

Steps to Reproduce:

  1. git clone https://github.com/NVIDIA/TransformerEngine.git
  2. cd TransformerEngine && git submodule update --init --recursive
  3. pip install -e .

Environment

Python Version: 3.10 torch Version: 2.1.0 CUDA Version: 12.1 Operating System: ubuntu 22.04

timmoon10 commented 1 month ago

Thanks for the bug report. I think this is because https://github.com/NVIDIA/TransformerEngine/pull/772 added int8_t arguments to some of the PyTorch extensions, and support for int8_t in extensions was added in PyTorch 2.3.0 (see https://github.com/pytorch/pytorch/pull/119639). Can you try rebuilding with https://github.com/NVIDIA/TransformerEngine/pull/882?

zirui commented 1 month ago

Thanks for the bug report. I think this is because #772 added int8_t arguments to some of the PyTorch extensions, and support for int8_t in extensions was added in PyTorch 2.3.0 (see pytorch/pytorch#119639). Can you try rebuilding with #882?

I have tried rebuild the code with the changes in #882, and it successfully built! Thank you for identifying the issue and providing the solution.