NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.61k stars 256 forks source link

Building wheel error during installation #978

Open Drzhishi opened 6 days ago

Drzhishi commented 6 days ago

I manually download flash-attn, then use 'pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable' for installation, Received error 'Building wheel for transformer_engine (setup.py)... error'

torch2.2 cuda11.8

(tuling) xx@DESKTOP-UA3C67F:~/ChatTTS$ pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-9lezr884 Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-9lezr884 Running command git checkout -b stable --track origin/stable Switched to a new branch 'stable' Branch 'stable' set up to track remote branch 'stable' from 'origin'. Resolved https://github.com/NVIDIA/TransformerEngine.git to commit c81733f1032a56a817b594c8971a738108ded7d0 Running command git submodule update --init --recursive -q Preparing metadata (setup.py) ... done Requirement already satisfied: pydantic in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.7.4) Requirement already satisfied: torch in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.2.2) Requirement already satisfied: flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from transformer_engine==1.6.0+c81733f) (2.4.2) Requirement already satisfied: einops in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (0.8.0) Requirement already satisfied: packaging in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (24.1) Requirement already satisfied: ninja in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.4.2,>=2.0.6->transformer_engine==1.6.0+c81733f) (1.11.1.1) Requirement already satisfied: annotated-types>=0.4.0 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (0.7.0) Requirement already satisfied: pydantic-core==2.18.4 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (2.18.4) Requirement already satisfied: typing-extensions>=4.6.1 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from pydantic->transformer_engine==1.6.0+c81733f) (4.11.0) Requirement already satisfied: filelock in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.13.1) Requirement already satisfied: sympy in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (1.12) Requirement already satisfied: networkx in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.2.1) Requirement already satisfied: jinja2 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (3.1.4) Requirement already satisfied: fsspec in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from torch->transformer_engine==1.6.0+c81733f) (2024.6.1) Requirement already satisfied: MarkupSafe>=2.0 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from jinja2->torch->transformer_engine==1.6.0+c81733f) (2.1.3) Requirement already satisfied: mpmath>=0.19 in /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages (from sympy->torch->transformer_engine==1.6.0+c81733f) (1.3.0) Building wheels for collected packages: transformer_engine Building wheel for transformer_engine (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [242 lines of output] Could not determine CUDA Toolkit version /home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/init.py:81: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated. !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************

  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-310
  creating build/lib.linux-x86_64-cpython-310/transformer_engine
  copying transformer_engine/_version.py -> build/lib.linux-x86_64-cpython-310/transformer_engine
  copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/common
  copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
  copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/graph.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
  copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/flax
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
  copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/jax/praxis
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/common/recipe
  copying transformer_engine/common/recipe/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/common/recipe
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/module
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/pytorch/cpp_extensions
  creating build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/rmsnorm.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-310/transformer_engine/paddle/layer
  running build_ext
  Building CMake extension transformer_engine
  Running command /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake -S /tmp/pip-req-build-9lezr884/transformer_engine -B /tmp/pip-req-build-9lezr884/build/cmake -DPython_EXECUTABLE=/home/cx/anaconda3/envs/tuling/bin/python -DPython_INCLUDE_DIR=/home/cx/anaconda3/envs/tuling/include/python3.10 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-9lezr884/build/lib.linux-x86_64-cpython-310 -GNinja
  -- The CUDA compiler identification is NVIDIA 11.8.89
  -- The CXX compiler identification is GNU 11.4.0
  -- Detecting CUDA compiler ABI info
  -- Detecting CUDA compiler ABI info - done
  -- Check for working CUDA compiler: /usr/local/cuda-11.8/bin/nvcc - skipped
  -- Detecting CUDA compile features
  -- Detecting CUDA compile features - done
  -- Detecting CXX compiler ABI info
  -- Detecting CXX compiler ABI info - done
  -- Check for working CXX compiler: /usr/bin/c++ - skipped
  -- Detecting CXX compile features
  -- Detecting CXX compile features - done
  -- Found CUDAToolkit: /usr/local/cuda-11.8/targets/x86_64-linux/include (found version "11.8.89")
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
  -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
  -- Found Threads: TRUE
  -- cudnn found at /usr/local/cuda-11.8/lib64/libcudnn.so.
  CMake Warning (dev) at /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/share/cmake-3.29/Modules/FindPackageHandleStandardArgs.cmake:438 (message):
    The package name passed to `find_package_handle_standard_args` (LIBRARY)
    does not match the name of the calling package (CUDNN).  This can lead to
    problems in calling code that expects `find_package` result variables
    (e.g., `_FOUND`) to follow a certain pattern.
  Call Stack (most recent call first):
    cmake/FindCUDNN.cmake:44 (find_package_handle_standard_args)
    CMakeLists.txt:24 (find_package)
  This warning is for project developers.  Use -Wno-dev to suppress it.

  -- Found LIBRARY: /usr/local/cuda-11.8/targets/x86_64-linux/include
  -- cuDNN: /usr/local/cuda-11.8/lib64/libcudnn.so
  -- cuDNN: /usr/local/cuda-11.8/targets/x86_64-linux/include
  -- cudnn_adv_infer found at /usr/local/cuda-11.8/lib64/libcudnn_adv_infer.so.
  -- cudnn_adv_train found at /usr/local/cuda-11.8/lib64/libcudnn_adv_train.so.
  -- cudnn_cnn_infer found at /usr/local/cuda-11.8/lib64/libcudnn_cnn_infer.so.
  -- cudnn_cnn_train found at /usr/local/cuda-11.8/lib64/libcudnn_cnn_train.so.
  -- cudnn_ops_infer found at /usr/local/cuda-11.8/lib64/libcudnn_ops_infer.so.
  -- cudnn_ops_train found at /usr/local/cuda-11.8/lib64/libcudnn_ops_train.so.
  -- Found Python: /home/cx/anaconda3/envs/tuling/bin/python (found version "3.10.14") found components: Interpreter Development.Module
  -- JAX support: OFF
  -- Configuring done (9.9s)
  -- Generating done (0.0s)
  -- Build files have been written to: /tmp/pip-req-build-9lezr884/build/cmake
  Running command /tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake --build /tmp/pip-req-build-9lezr884/build/cmake
  [1/32] Building CXX object common/CMakeFiles/transformer_engine.dir/transformer_engine.cpp.o
  [2/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/gemm/cublaslt_gemm.cu.o
  /tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  /tmp/pip-req-build-9lezr884/transformer_engine/common/gemm/cublaslt_gemm.cu(73): warning #550-D: variable "counter" was set but never used

  [3/32] Building CXX object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_api.cpp.o
  [4/32] Building CXX object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn.cpp.o
  [5/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose.cu.o
  [6/32] Building CXX object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_api.cpp.o
  [7/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/transpose_fusion.cu.o
  [8/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/swiglu.cu.o
  [9/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/relu.cu.o
  [10/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/util/cast.cu.o
  [11/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o
  [12/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_bwd_semi_cuda_kernel.cu.o
  [13/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o
  [14/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/rtc.cpp.o
  [15/32] Building CXX object common/CMakeFiles/transformer_engine.dir/util/system.cpp.o
  [16/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/activation/gelu.cu.o
  [17/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_rope/fused_rope.cu.o
  [18/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/recipe/delayed_scaling.cu.o
  [19/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose.cu.o
  [20/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_bwd_semi_cuda_kernel.cu.o
  [21/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/multi_cast_transpose.cu.o
  [22/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_aligned_causal_masked_softmax.cu.o
  [23/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_masked_softmax.cu.o
  [24/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/rmsnorm/rmsnorm_fwd_cuda_kernel.cu.o
  [25/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_softmax/scaled_upper_triang_masked_softmax.cu.o
  [26/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/transpose/cast_transpose_fusion.cu.o
  [27/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
  FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
  /usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_fp8.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_fp8.cu.o
  Killed
  [28/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
  FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
  /usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_f16_arbitrary_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_arbitrary_seqlen.cu.o
  Killed
  Killed
  Killed
  [29/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
  FAILED: common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
  /usr/local/cuda-11.8/bin/nvcc -forward-unknown-to-host-compiler -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-9lezr884/transformer_engine -I/tmp/pip-req-build-9lezr884/transformer_engine/common/include -I/tmp/pip-req-build-9lezr884/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-9lezr884/build/cmake/common/string_headers -isystem /usr/local/cuda-11.8/targets/x86_64-linux/include --threads 4 --expt-relaxed-constexpr -O3 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC -MD -MT common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o -MF common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o.d -x cu -c /tmp/pip-req-build-9lezr884/transformer_engine/common/fused_attn/fused_attn_f16_max512_seqlen.cu -o common/CMakeFiles/transformer_engine.dir/fused_attn/fused_attn_f16_max512_seqlen.cu.o
  Killed
  Killed
  [30/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/layer_norm/ln_fwd_cuda_kernel.cu.o
  [31/32] Building CUDA object common/CMakeFiles/transformer_engine.dir/fused_attn/utils.cu.o
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/tmp/pip-req-build-9lezr884/setup.py", line 336, in _build_cmake
      subprocess.run(command, cwd=build_dir, check=True)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/subprocess.py", line 526, in run
      raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['/tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake', '--build', '/tmp/pip-req-build-9lezr884/build/cmake']' returned non-zero exit status 1.

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-req-build-9lezr884/setup.py", line 617, in <module>
      main()
    File "/tmp/pip-req-build-9lezr884/setup.py", line 602, in main
      setuptools.setup(
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/__init__.py", line 104, in setup
      return distutils.core.setup(**attrs)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 184, in setup
      return run_commands(dist)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 200, in run_commands
      dist.run_commands()
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 368, in run
      self.run_command("build")
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/command/build.py", line 132, in run
      self.run_command(cmd_name)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
      self.distribution.run_command(command)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/dist.py", line 967, in run_command
      super().run_command(command)
    File "/home/cx/anaconda3/envs/tuling/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-req-build-9lezr884/setup.py", line 368, in run
      ext._build_cmake(
    File "/tmp/pip-req-build-9lezr884/setup.py", line 338, in _build_cmake
      raise RuntimeError(f"Error when running CMake: {e}")
  RuntimeError: Error when running CMake: Command '['/tmp/pip-req-build-9lezr884/.eggs/cmake-3.29.6-py3.10-linux-x86_64.egg/cmake/data/bin/cmake', '--build', '/tmp/pip-req-build-9lezr884/build/cmake']' returned non-zero exit status 1.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for transformer_engine Running setup.py clean for transformer_engine Failed to build transformer_engine ERROR: Could not build wheels for transformer_engine, which is required to install pyproject.toml-based projects

How do I install successfully, and is it related to cmake? I would be very grateful if you could give me a detailed answer.

timmoon10 commented 6 days ago

We use Ninja to parallelize the build process and I suspect it's overwhelming your system resources. We're thinking about ways to handle this more gracefully, but for now can you try running with CMAKE_BUILD_PARALLEL_LEVEL=1 in your environment? You may also want to see https://github.com/NVIDIA/TransformerEngine/issues/976#issuecomment-2195745927.