NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.66k stars 263 forks source link

PIP Installation Failed #654

Open mahdip72 opened 5 months ago

mahdip72 commented 5 months ago

Hello I want to install TE using pip: pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

But I got the following error during installation:

Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable
  Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-c6l34itl
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-c6l34itl
  Running command git checkout -b stable --track origin/stable
  Switched to a new branch 'stable'
  Branch 'stable' set up to track remote branch 'stable' from 'origin'.
  Resolved https://github.com/NVIDIA/TransformerEngine.git to commit bbafb02097e6ca1605c3c0cad84d59dbbcb6e94b
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... done
Collecting pydantic (from transformer-engine==1.2.1+bbafb02)
  Downloading pydantic-2.6.0-py3-none-any.whl.metadata (81 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 81.8/81.8 kB 1.1 MB/s eta 0:00:00
Requirement already satisfied: torch in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from transformer-engine==1.2.1+bbafb02) (2.1.2)
Collecting flash-attn!=2.0.9,!=2.1.0,<=2.3.3,>=1.0.6 (from transformer-engine==1.2.1+bbafb02)
  Downloading flash_attn-2.3.3.tar.gz (2.3 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.3/2.3 MB 11.7 MB/s eta 0:00:00
  Preparing metadata (setup.py) ... done
Collecting einops (from flash-attn!=2.0.9,!=2.1.0,<=2.3.3,>=1.0.6->transformer-engine==1.2.1+bbafb02)
  Using cached einops-0.7.0-py3-none-any.whl.metadata (13 kB)
Requirement already satisfied: packaging in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.3.3,>=1.0.6->transformer-engine==1.2.1+bbafb02) (23.1)
Collecting ninja (from flash-attn!=2.0.9,!=2.1.0,<=2.3.3,>=1.0.6->transformer-engine==1.2.1+bbafb02)
  Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl.metadata (5.3 kB)
Collecting annotated-types>=0.4.0 (from pydantic->transformer-engine==1.2.1+bbafb02)
  Using cached annotated_types-0.6.0-py3-none-any.whl.metadata (12 kB)
Collecting pydantic-core==2.16.1 (from pydantic->transformer-engine==1.2.1+bbafb02)
  Downloading pydantic_core-2.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Requirement already satisfied: typing-extensions>=4.6.1 in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from pydantic->transformer-engine==1.2.1+bbafb02) (4.9.0)
Requirement already satisfied: filelock in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from torch->transformer-engine==1.2.1+bbafb02) (3.13.1)
Requirement already satisfied: sympy in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from torch->transformer-engine==1.2.1+bbafb02) (1.12)
Requirement already satisfied: networkx in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from torch->transformer-engine==1.2.1+bbafb02) (3.1)
Requirement already satisfied: jinja2 in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from torch->transformer-engine==1.2.1+bbafb02) (3.1.2)
Requirement already satisfied: fsspec in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from torch->transformer-engine==1.2.1+bbafb02) (2023.12.2)
Requirement already satisfied: MarkupSafe>=2.0 in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from jinja2->torch->transformer-engine==1.2.1+bbafb02) (2.1.3)
Requirement already satisfied: mpmath>=0.19 in /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages (from sympy->torch->transformer-engine==1.2.1+bbafb02) (1.3.0)
Downloading pydantic-2.6.0-py3-none-any.whl (394 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 394.2/394.2 kB 20.6 MB/s eta 0:00:00
Downloading pydantic_core-2.16.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.2/2.2 MB 34.0 MB/s eta 0:00:00
Using cached annotated_types-0.6.0-py3-none-any.whl (12 kB)
Using cached einops-0.7.0-py3-none-any.whl (44 kB)
Using cached ninja-1.11.1.1-py2.py3-none-manylinux1_x86_64.manylinux_2_5_x86_64.whl (307 kB)
Building wheels for collected packages: transformer-engine, flash-attn
  Building wheel for transformer-engine (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [177 lines of output]
      /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!

              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************

      !!
        dist.fetch_build_eggs(dist.setup_requires)
      running bdist_wheel
      /home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/torch/utils/cpp_extension.py:502: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
        warnings.warn(msg.format('we could not find ninja.'))
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-311
      creating build/lib.linux-x86_64-cpython-311/transformer_engine
      copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/common
      copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
      copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
      copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      running build_ext
      Building CMake extension transformer_engine
      Running command /usr/bin/cmake -S /tmp/pip-req-build-c6l34itl/transformer_engine -B /tmp/tmp7k0z17__ -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-c6l34itl/build/lib.linux-x86_64-cpython-311
      CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
        Compiling the CUDA compiler identification source file
        "CMakeCUDACompilerId.cu" failed.

        Compiler: CMAKE_CUDA_COMPILER-NOTFOUND

        Build flags:

        Id flags: -v

        The output was:

        No such file or directory

      Call Stack (most recent call first):
        /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
        /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
        /usr/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
        CMakeLists.txt:15 (project)

      CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
      -- Configuring incomplete, errors occurred!
      See also "/tmp/tmp7k0z17__/CMakeFiles/CMakeOutput.log".
      See also "/tmp/tmp7k0z17__/CMakeFiles/CMakeError.log".
      Traceback (most recent call last):
        File "/tmp/pip-req-build-c6l34itl/setup.py", line 353, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-c6l34itl/transformer_engine', '-B', '/tmp/tmp7k0z17__', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-c6l34itl/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1.

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-c6l34itl/setup.py", line 626, in <module>
          main()
        File "/tmp/pip-req-build-c6l34itl/setup.py", line 611, in main
          setuptools.setup(
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/wheel/bdist_wheel.py", line 364, in run
          self.run_command("build")
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/dist.py", line 989, in run_command
          super().run_command(command)
        File "/home/mpngf/.conda/envs/joint_training_221/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-c6l34itl/setup.py", line 383, in run
          ext._build_cmake(
        File "/tmp/pip-req-build-c6l34itl/setup.py", line 355, in _build_cmake
          raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/tmp/pip-req-build-c6l34itl/transformer_engine', '-B', '/tmp/tmp7k0z17__', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-c6l34itl/build/lib.linux-x86_64-cpython-311']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer-engine
  Running setup.py clean for transformer-engine
  Building wheel for flash-attn (setup.py) ... done
  Created wheel for flash-attn: filename=flash_attn-2.3.3-cp311-cp311-linux_x86_64.whl size=57059098 sha256=499cf51e6036240086c35fd7e49bec42da788d451784ba154b4c654cef4b3510
  Stored in directory: /home/mpngf/.cache/pip/wheels/b4/30/9a/5a0c57df68c4836bab04e05a1078d6c378b68c61d38be4f45b
Successfully built flash-attn
Failed to build transformer-engine
ERROR: Could not build wheels for transformer-engine, which is required to install pyproject.toml-based projects

I have A6000 ada GPUs and also, am using conda enviromnet in which I installed:

pytorch 2.1.2 + cuda 12.1

I think the issue might be related to cuda but I dont know how to fix it.

markusheimerl commented 5 months ago

Im running into a similar problem.

pip install --use-pep51 git+https://github.com/NVIDIA/TransformerEngine.git@stable
Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable
  Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-6bdrrezv
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-6bdrrezv
  Running command git checkout -b stable --track origin/stable
  Switched to a new branch 'stable'
  Branch 'stable' set up to track remote branch 'stable' from 'origin'.
  Resolved https://github.com/NVIDIA/TransformerEngine.git to commit bbafb02097e6ca1605c3c0cad84d59dbbcb6e94b
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting pydantic (from transformer_engine==1.2.1+bbafb02)
  Using cached pydantic-2.6.1-py3-none-any.whl.metadata (83 kB)
Collecting annotated-types>=0.4.0 (from pydantic->transformer_engine==1.2.1+bbafb02)
  Using cached annotated_types-0.6.0-py3-none-any.whl.metadata (12 kB)
Collecting pydantic-core==2.16.2 (from pydantic->transformer_engine==1.2.1+bbafb02)
  Using cached pydantic_core-2.16.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Requirement already satisfied: typing-extensions>=4.6.1 in /usr/local/lib/python3.10/dist-packages (from pydantic->transformer_engine==1.2.1+bbafb02) (4.9.0)
Using cached pydantic-2.6.1-py3-none-any.whl (394 kB)
Using cached pydantic_core-2.16.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.2 MB)
Using cached annotated_types-0.6.0-py3-none-any.whl (12 kB)
Building wheels for collected packages: transformer_engine
  Building wheel for transformer_engine (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for transformer_engine (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [171 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-cpython-310
      creating build/lib.linux-x86_64-cpython-310/transformer_engine
      copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/common
      copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
      copying transformer_engine/common/recipe.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
      copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
      creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
      running build_ext
      -- The CUDA compiler identification is NVIDIA 12.1.105
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.1.105")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- cudnn not found.
      -- cudnn_adv_infer not found.
      -- cudnn_adv_train not found.
      -- cudnn_cnn_infer not found.
      -- cudnn_cnn_train not found.
      -- cudnn_ops_infer not found.
      -- cudnn_ops_train not found.
      CMake Error at /tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
        Could NOT find CUDNN (missing: CUDNN_LIBRARY)
      Call Stack (most recent call first):
        /tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:600 (_FPHSA_FAILURE_MESSAGE)
        cmake/FindCUDNN.cmake:46 (find_package_handle_standard_args)
        CMakeLists.txt:24 (find_package)

      -- Configuring incomplete, errors occurred!
      Building CMake extension transformer_engine
      Running command /tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -S /tmp/pip-req-build-6bdrrezv/transformer_engine -B /tmp/tmpscbva33l -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-6bdrrezv/build/lib.linux-x86_64-cpython-310 -GNinja
      Traceback (most recent call last):
        File "<string>", line 353, in _build_cmake
        File "/usr/lib/python3.10/subprocess.py", line 526, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-6bdrrezv/transformer_engine', '-B', '/tmp/tmpscbva33l', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-6bdrrezv/build/lib.linux-x86_64-cpython-310', '-GNinja']' returned non-zero exit status 1.

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/usr/local/lib/python3.10/dist-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 251, in build_wheel
          return _build_backend().build_wheel(wheel_directory, config_settings,
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 404, in build_wheel
          return self._build_with_temp_dir(
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 389, in _build_with_temp_dir
          self.run_setup()
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 480, in run_setup
          super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 626, in <module>
        File "<string>", line 611, in main
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 103, in setup
          return distutils.core.setup(**attrs)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/wheel/bdist_wheel.py", line 368, in run
          self.run_command("build")
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 131, in run
          self.run_command(cmd_name)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 318, in run_command
          self.distribution.run_command(command)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/dist.py", line 963, in run_command
          super().run_command(command)
        File "/tmp/pip-build-env-a74ikyx0/overlay/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "<string>", line 383, in run
        File "<string>", line 355, in _build_cmake
      RuntimeError: Error when running CMake: Command '['/tmp/pip-build-env-a74ikyx0/normal/local/lib/python3.10/dist-packages/cmake/data/bin/cmake', '-S', '/tmp/pip-req-build-6bdrrezv/transformer_engine', '-B', '/tmp/tmpscbva33l', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-6bdrrezv/build/lib.linux-x86_64-cpython-310', '-GNinja']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer_engine
Failed to build transformer_engine
ERROR: Could not build wheels for transformer_engine, which is required to install pyproject.toml-based projects
root@7d9c74e72ac3:/usr/local/cuda/lib64# 
markusheimerl commented 5 months ago

Downloaded https://developer.nvidia.com/downloads/compute/cudnn/secure/8.9.7/local_installers/12.x/cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz/

ran

tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz && cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/lib/libcudnn* /usr/local/cuda/lib64/ && cp cudnn-linux-x86_64-8.9.7.29_cuda12-archive/include/cudnn*.h /usr/local/cuda/include/ && ldconfig /usr/local/cuda/lib64 && cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 && pip install --use-pep51 git+https://github.com/NVIDIA/TransformerEngine.git@stable

works.

Since my pytorch already uses cudnn there mustve been another installation though.

root@7d9c74e72ac3:/usr/local/cuda/lib64# find / -name "libcudnn.so*" 2>/dev/null
/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/libcudnn.so.8
root@7d9c74e72ac3:/usr/local/cuda/lib64# 

Maybe the build tool could check here too.

markusheimerl commented 5 months ago

Now execution fails because "transformerengine_extras" can't be found...

timmoon10 commented 4 months ago

@mahdip72 It looks CMake is having trouble finding your C++ compiler and your CUDA installation. Can you try setting the CXX and CUDA_PATH environment variables?

@markusheimerl The best way to specify custom cuDNN installs is by setting the CUDNN_PATH environment variable. As for the runtime failure, I assume it's not correctly importing transformer_engine_extensions? Can you check if the extensions .so file installed (should look something like transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so)? If it was, then it should just be a matter of setting PYTHONPATH so Python can find it. Otherwise, we'll need to debug how Transformer Engine got past the build process without complaining (maybe it didn't find PyTorch and skipped building the extensions?).