NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.92k stars 320 forks source link

Bulid transformer enginer is failed caused by cmake command error! #1020

Open sfdeggb opened 3 months ago

sfdeggb commented 3 months ago

An error occurred when I tried to download transformer enginner following the official tutorial! (https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html)I have try some issues were tried in the community discussion。They are respectively the issues 700,614,383,335,954。 The main error is :

raise RuntimeError(f"Error when running CMake: {e}") RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/home/ubuntu/TransformerEngine/transformer_engine', '->B', '/home/ubuntu/TransformerEngine/build/cmake', '->DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '->DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '->DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311', '->GNinja']' returned non-zero exit status 1. [end of output]

Details of the error are as follows:

(yuxunlian) ubuntu@ip-172-31-38-93:~$ cd TransformerEngine/ (yuxunlian) ubuntu@ip-172-31-38-93:~/TransformerEngine$ pip install . Processing /home/ubuntu/TransformerEngine Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from transformer_engine==1.7.0+4e7caa1) (2.8.2) Requirement already satisfied: packaging in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from transformer_engine==1.7.0+4e7caa1) (24.1) Requirement already satisfied: torch in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from transformer_engine==1.7.0+4e7caa1) (2.3.1) Requirement already satisfied: flash-attn!=2.0.9,!=2.1.0,<=2.5.8,>=2.0.6 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from transformer_engine==1.7.0+4e7caa1) (2.5.8) Requirement already satisfied: einops in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.5.8,>=2.0.6->transformer_engine==1.7.0+4e7caa1) (0.8.0) Requirement already satisfied: ninja in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.5.8,>=2.0.6->transformer_engine==1.7.0+4e7caa1) (1.11.1.1) Requirement already satisfied: annotated-types>=0.4.0 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from pydantic->transformer_engine==1.7.0+4e7caa1) (0.7.0) Requirement already satisfied: pydantic-core==2.20.1 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from pydantic->transformer_engine==1.7.0+4e7caa1) (2.20.1) Requirement already satisfied: typing-extensions>=4.6.1 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from pydantic->transformer_engine==1.7.0+4e7caa1) (4.12.2) Requirement already satisfied: filelock in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (3.15.4) Requirement already satisfied: sympy in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (1.13.0) Requirement already satisfied: networkx in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (3.3) Requirement already satisfied: jinja2 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (3.1.4) Requirement already satisfied: fsspec in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (2024.5.0) Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.105) Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.105) Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.105) Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (8.9.2.26) Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.3.1) Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (11.0.2.54) Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (10.3.2.106) Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (11.4.5.107) Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.0.106) Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (2.20.5) Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (12.1.105) Requirement already satisfied: triton==2.3.1 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from torch->transformer_engine==1.7.0+4e7caa1) (2.3.1) Requirement already satisfied: nvidia-nvjitlink-cu12 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from nvidia-cusolver-cu12==11.4.5.107->torch->transformer_engine==1.7.0+4e7caa1) (12.5.82) Requirement already satisfied: MarkupSafe>=2.0 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from jinja2->torch->transformer_engine==1.7.0+4e7caa1) (2.1.5) Requirement already satisfied: mpmath<1.4,>=1.1.0 in /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages (from sympy->torch->transformer_engine==1.7.0+4e7caa1) (1.3.0) Building wheels for collected packages: transformer_engine Building wheel for transformer_engine (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [161 lines of output] running bdist_wheel running build running build_py copying transformer_engine/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine copying transformer_engine/_version.py -> build/lib.linux-x86_64-cpython-311/transformer_engine copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/graph.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax copying transformer_engine/common/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions copying transformer_engine/jax/flax/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax copying transformer_engine/jax/praxis/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis copying transformer_engine/common/recipe/init.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common/recipe running build_ext Building CMake extension transformer_engine Running command /usr/bin/cmake -S /home/ubuntu/TransformerEngine/transformer_engine -B /home/ubuntu/TransformerEngine/build/cmake -DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1 -DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311 -GNinja CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message): Compiling the CUDA compiler identification source file "CMakeCUDACompilerId.cu" failed.

    Compiler: /usr/local/cuda-11.7/bin/nvcc

    Build flags:

    Id flags:
    --keep;--keep-dir;tmp;-gencode=arch=compute_70,code=sm_70;-gencode=arch=compute_80,code=sm_80;-gencode=arch=compute_89,code=sm_89;-gencode=arch=compute_90,code=sm_90
    -v

    The output was:

    1

    nvcc fatal : Unsupported gpu architecture 'compute_89'

  Call Stack (most recent call first):
    /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
    /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
    /usr/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
    CMakeLists.txt:15 (project)

  CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
  -- Configuring incomplete, errors occurred!
  See also "/home/ubuntu/TransformerEngine/build/cmake/CMakeFiles/CMakeOutput.log".
  See also "/home/ubuntu/TransformerEngine/build/cmake/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/home/ubuntu/TransformerEngine/setup.py", line 337, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/subprocess.py", line 569, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-S', '/home/ubuntu/TransformerEngine/transformer_engine', '-B', '/home/ubuntu/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/ubuntu/TransformerEngine/setup.py", line 618, in <module>
      main()
    File "/home/ubuntu/TransformerEngine/setup.py", line 603, in main
      setuptools.setup(
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 970, in run_commands
      self.run_command(cmd)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/command/bdist_wheel.py", line 373, in run
      self.run_command("build")
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/TransformerEngine/setup.py", line 369, in run
      ext._build_cmake(
    File "/home/ubuntu/TransformerEngine/setup.py", line 339, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/home/ubuntu/TransformerEngine/transformer_engine', '-B', '/home/ubuntu/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine Running setup.py clean for transformer_engine Failed to build transformer_engine Installing collected packages: transformer_engine Running setup.py install for transformer_engine ... error error: subprocess-exited-with-error

× Running setup.py install for transformer_engine did not run successfully. │ exit code: 1 ╰─> [189 lines of output] running install /home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated. !!

          ********************************************************************************
          Please avoid running ``setup.py`` directly.
          Instead, use pypa/build, pypa/installer or other
          standards-based tools.

          See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
          ********************************************************************************

  !!
    self.initialize_options()
  running build
  running build_py
  creating build/lib.linux-x86_64-cpython-311
  creating build/lib.linux-x86_64-cpython-311/transformer_engine
  copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
  copying transformer_engine/_version.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/graph.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
  creating build/lib.linux-x86_64-cpython-311/transformer_engine/common/recipe
  copying transformer_engine/common/recipe/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common/recipe
  running build_ext
  Building CMake extension transformer_engine
  Running command /usr/bin/cmake -S /home/ubuntu/TransformerEngine/transformer_engine -B /home/ubuntu/TransformerEngine/build/cmake -DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1 -DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311 -GNinja
  CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:726 (message):
    Compiling the CUDA compiler identification source file
    "CMakeCUDACompilerId.cu" failed.

    Compiler: /usr/local/cuda-11.7/bin/nvcc

    Build flags:

    Id flags:
    --keep;--keep-dir;tmp;-gencode=arch=compute_70,code=sm_70;-gencode=arch=compute_80,code=sm_80;-gencode=arch=compute_89,code=sm_89;-gencode=arch=compute_90,code=sm_90
    -v

    The output was:

    1

    nvcc fatal : Unsupported gpu architecture 'compute_89'

  Call Stack (most recent call first):
    /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:6 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
    /usr/share/cmake-3.22/Modules/CMakeDetermineCompilerId.cmake:48 (__determine_compiler_id_test)
    /usr/share/cmake-3.22/Modules/CMakeDetermineCUDACompiler.cmake:298 (CMAKE_DETERMINE_COMPILER_ID)
    CMakeLists.txt:15 (project)

  CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
  -- Configuring incomplete, errors occurred!
  See also "/home/ubuntu/TransformerEngine/build/cmake/CMakeFiles/CMakeOutput.log".
  See also "/home/ubuntu/TransformerEngine/build/cmake/CMakeFiles/CMakeError.log".
  Traceback (most recent call last):
    File "/home/ubuntu/TransformerEngine/setup.py", line 337, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/subprocess.py", line 569, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/usr/bin/cmake', '-S', '/home/ubuntu/TransformerEngine/transformer_engine', '-B', '/home/ubuntu/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/ubuntu/TransformerEngine/setup.py", line 618, in <module>
      main()
    File "/home/ubuntu/TransformerEngine/setup.py", line 603, in main
      setuptools.setup(
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
             ^^^^^^^^^^^^^^^^^^
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 970, in run_commands
      self.run_command(cmd)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/command/install.py", line 81, in run
      return super().run()
             ^^^^^^^^^^^^^
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/command/install.py", line 694, in run
      self.run_command('build')
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
      self.run_command(cmd_name)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/dist.py", line 974, in run_command
      super().run_command(command)
    File "/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 989, in run_command
      cmd_obj.run()
    File "/home/ubuntu/TransformerEngine/setup.py", line 369, in run
      ext._build_cmake(
    File "/home/ubuntu/TransformerEngine/setup.py", line 339, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/usr/bin/cmake', '-S', '/home/ubuntu/TransformerEngine/transformer_engine', '-B', '/home/ubuntu/TransformerEngine/build/cmake', '-DPython_EXECUTABLE=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/bin/python3.1', '-DPython_INCLUDE_DIR=/home/ubuntu/train/aconconda/acondada/envs/yuxunlian/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/home/ubuntu/TransformerEngine/build/lib.linux-x86_64-cpython-311', '-GNinja']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: legacy-install-failure

× Encountered error while trying to install package. ╰─> transformer_engine

note: This is an issue with the package mentioned above, not pip. hint: See above for output from the failure.

Looking forward to the solution!

timmoon10 commented 3 months ago