NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
2k stars 332 forks source link

Failed to build transformer-engine #1270

Open jaefan11 opened 1 month ago

jaefan11 commented 1 month ago

pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

Thanks in advance.

python: 3.9 pytorch: 2.1.0 cuda: 12.1 gcc: 7.5.0

FAILED: CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o /usr/local/bin/c++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/.. -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/include -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-hfgbdcj5/build/cmake/string_headers -isystem /usr/local/cuda-12.1/targets/x86_64-linux/include -Wl,--version-script=/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/libtransformer_engine.version -O3 -DNDEBUG -std=gnu++1z -fPIC -MD -MT CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o -MF CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o.d -o CMakeFiles/transformer_engine.dir/util/cuda_driver.cpp.o -c /tmp/pip-req-build-hfgbdcj5/transformer_engine/common/util/cuda_driver.cpp /tmp/pip-req-build-hfgbdcj5/transformer_engine/common/util/cuda_driver.cpp:9:10: fatal error: filesystem: No such file or directory

include

            ^~~~~~~~~~~~
  compilation terminated.
  [3/36] /usr/local/bin/c++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/.. -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/include -I/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/tmp/pip-req-build-hfgbdcj5/build/cmake/string_headers -isystem /usr/local/cuda-12.1/targets/x86_64-linux/include -Wl,--version-script=/tmp/pip-req-build-hfgbdcj5/transformer_engine/common/libtransformer_engine.version -O3 -DNDEBUG -std=gnu++1z -fPIC -MD -MT CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o -MF CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o.d -o CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o -c /tmp/pip-req-build-hfgbdcj5/transformer_engine/common/util/cuda_runtime.cpp
  FAILED: CMakeFiles/transformer_engine.dir/util/cuda_runtime.cpp.o
timmoon10 commented 1 month ago

Try updating your compiler. TE requires C++17 and GCC added the filesystem header in GCC 8.1 (see GCC's C++17 support). We currently use GCC 13.2.0 in our internal builds.

s-smits commented 4 weeks ago

python: 3.11 pytorch: 2.4.1 cuda: 12.4 cudnn: 9 gcc: 11.4.0

docker: pytorch/pytorch:2.4.1-cuda12.4-cudnn9-devel RTX4090

Collecting git+https://github.com/NVIDIA/TransformerEngine.git@stable
  Cloning https://github.com/NVIDIA/TransformerEngine.git (to revision stable) to /tmp/pip-req-build-s96o7cy6
  Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA/TransformerEngine.git /tmp/pip-req-build-s96o7cy6
  Running command git checkout -b stable --track origin/stable
  Switched to a new branch 'stable'
  Branch 'stable' set up to track remote branch 'stable' from 'origin'.
  Resolved https://github.com/NVIDIA/TransformerEngine.git to commit c27ee60ec746210bcea4ec33958dbbff06706506
  Running command git submodule update --init --recursive -q
  Preparing metadata (setup.py) ... done
Collecting pydantic (from transformer_engine==1.11.0+c27ee60)
  Using cached pydantic-2.9.2-py3-none-any.whl.metadata (149 kB)
Collecting flash-attn!=2.0.9,!=2.1.0,<=2.6.3,>=2.0.6 (from transformer_engine==1.11.0+c27ee60)
  Downloading flash_attn-2.6.3.tar.gz (2.6 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.6/2.6 MB 4.9 MB/s eta 0:00:00a 0:00:01
  Preparing metadata (setup.py) ... done
Requirement already satisfied: torch in /opt/conda/lib/python3.11/site-packages (from transformer_engine==1.11.0+c27ee60) (2.4.1+cu124)
Collecting importlib-metadata>=1.0 (from transformer_engine==1.11.0+c27ee60)
  Using cached importlib_metadata-8.5.0-py3-none-any.whl.metadata (4.8 kB)
Requirement already satisfied: packaging in /opt/conda/lib/python3.11/site-packages (from transformer_engine==1.11.0+c27ee60) (24.1)
Requirement already satisfied: einops in /opt/conda/lib/python3.11/site-packages (from flash-attn!=2.0.9,!=2.1.0,<=2.6.3,>=2.0.6->transformer_engine==1.11.0+c27ee60) (0.8.0)
Requirement already satisfied: zipp>=3.20 in /opt/conda/lib/python3.11/site-packages (from importlib-metadata>=1.0->transformer_engine==1.11.0+c27ee60) (3.20.1)
Collecting annotated-types>=0.6.0 (from pydantic->transformer_engine==1.11.0+c27ee60)
  Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.23.4 (from pydantic->transformer_engine==1.11.0+c27ee60)
  Using cached pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/conda/lib/python3.11/site-packages (from pydantic->transformer_engine==1.11.0+c27ee60) (4.12.2)
Requirement already satisfied: filelock in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (3.15.4)
Requirement already satisfied: sympy in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (1.13.2)
Requirement already satisfied: networkx in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (3.3)
Requirement already satisfied: jinja2 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (3.1.4)
Requirement already satisfied: fsspec in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (2024.9.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.4.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.99)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.4.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.99)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.4.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.99)
Requirement already satisfied: nvidia-cudnn-cu12==9.1.0.70 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (9.1.0.70)
Requirement already satisfied: nvidia-cublas-cu12==12.4.2.65 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.2.65)
Requirement already satisfied: nvidia-cufft-cu12==11.2.0.44 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (11.2.0.44)
Requirement already satisfied: nvidia-curand-cu12==10.3.5.119 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (10.3.5.119)
Requirement already satisfied: nvidia-cusolver-cu12==11.6.0.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (11.6.0.99)
Requirement already satisfied: nvidia-cusparse-cu12==12.3.0.142 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.3.0.142)
Requirement already satisfied: nvidia-nccl-cu12==2.20.5 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (2.20.5)
Requirement already satisfied: nvidia-nvtx-cu12==12.4.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.99)
Requirement already satisfied: nvidia-nvjitlink-cu12==12.4.99 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (12.4.99)
Requirement already satisfied: triton==3.0.0 in /opt/conda/lib/python3.11/site-packages (from torch->transformer_engine==1.11.0+c27ee60) (3.0.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/lib/python3.11/site-packages (from jinja2->torch->transformer_engine==1.11.0+c27ee60) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/lib/python3.11/site-packages (from sympy->torch->transformer_engine==1.11.0+c27ee60) (1.3.0)
Using cached importlib_metadata-8.5.0-py3-none-any.whl (26 kB)
Using cached pydantic-2.9.2-py3-none-any.whl (434 kB)
Using cached pydantic_core-2.23.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Using cached annotated_types-0.7.0-py3-none-any.whl (13 kB)
Building wheels for collected packages: transformer_engine, flash-attn
  Building wheel for transformer_engine (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [425 lines of output]
      WARNING: Skipping transformer_engine_cu12 as it is not installed.
      WARNING: Skipping transformer_engine_torch as it is not installed.
      WARNING: Skipping transformer_engine_paddle as it is not installed.
      WARNING: Skipping transformer_engine_jax as it is not installed.
      WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.
      /opt/conda/lib/python3.11/site-packages/setuptools/__init__.py:94: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
      !!

              ********************************************************************************
              Requirements should be satisfied by a PEP 517 installer.
              If you are using pip, you can try `pip install --use-pep517`.
              ********************************************************************************

      !!
        dist.fetch_build_eggs(dist.setup_requires)
      running bdist_wheel
      running build
      running build_py
      creating build/lib.linux-x86_64-cpython-311/transformer_engine
      copying transformer_engine/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/common
      copying transformer_engine/common/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
      copying transformer_engine/common/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/dot.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/setup.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/sharding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      copying transformer_engine/jax/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/profile.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/recompute.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/setup.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/cpp_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      copying transformer_engine/paddle/fp8_buffer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/constants.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/export.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/fp8.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/numerics_debug.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/cpu_offload.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/distributed.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/graph.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/jit.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/permutation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/setup.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/te_onnx_extensions.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      copying transformer_engine/pytorch/utils.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/common/recipe
      copying transformer_engine/common/recipe/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/common/recipe
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/custom_call.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/misc.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/quantization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      copying transformer_engine/jax/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/cpp_extensions
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      copying transformer_engine/jax/flax/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/flax
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/module.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      copying transformer_engine/jax/praxis/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/jax/praxis
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/softmax.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/transformer.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/attention.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      copying transformer_engine/paddle/layer/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/paddle/layer
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/cast.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/normalization.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/padding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/transpose.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/fused_attn.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      copying transformer_engine/pytorch/cpp_extensions/gemm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/cpp_extensions
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/base.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/fp8_padding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/fp8_unpadding.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/grouped_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/layernorm_mlp.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      copying transformer_engine/pytorch/module/rmsnorm.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/module
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/_common.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/fuser.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/op.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      copying transformer_engine/pytorch/ops/sequential.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/optimizers
      copying transformer_engine/pytorch/optimizers/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/optimizers
      copying transformer_engine/pytorch/optimizers/multi_tensor_apply.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/optimizers
      copying transformer_engine/pytorch/optimizers/fused_adam.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/optimizers
      copying transformer_engine/pytorch/optimizers/fused_sgd.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/optimizers
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/tensor
      copying transformer_engine/pytorch/tensor/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/tensor
      copying transformer_engine/pytorch/tensor/float8_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/tensor
      copying transformer_engine/pytorch/tensor/quantized_tensor.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/tensor
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/add_in_place.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/all_gather.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/all_reduce.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/identity.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/make_extra_output.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/reduce_scatter.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/reshape.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/basic_linear.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      copying transformer_engine/pytorch/ops/basic/bias.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/basic
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/fused
      copying transformer_engine/pytorch/ops/fused/__init__.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/fused
      copying transformer_engine/pytorch/ops/fused/backward_linear_add.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/fused
      copying transformer_engine/pytorch/ops/fused/forward_linear_bias_activation.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/fused
      copying transformer_engine/pytorch/ops/fused/forward_linear_bias_add.py -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/ops/fused
      running egg_info
      creating transformer_engine.egg-info
      writing transformer_engine.egg-info/PKG-INFO
      writing dependency_links to transformer_engine.egg-info/dependency_links.txt
      writing requirements to transformer_engine.egg-info/requires.txt
      writing top-level names to transformer_engine.egg-info/top_level.txt
      writing manifest file 'transformer_engine.egg-info/SOURCES.txt'
      reading manifest file 'transformer_engine.egg-info/SOURCES.txt'
      adding license file 'LICENSE'
      writing manifest file 'transformer_engine.egg-info/SOURCES.txt'
      /opt/conda/lib/python3.11/site-packages/setuptools/command/build_py.py:220: _Warning: Package 'transformer_engine.pytorch.csrc' is absent from the `packages` configuration.
      !!

              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'transformer_engine.pytorch.csrc' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.

              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'transformer_engine.pytorch.csrc' is explicitly added
              to the `packages` configuration field.

              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).

              You can read more about "package discovery" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

              If you don't want 'transformer_engine.pytorch.csrc' to be distributed and are
              already explicitly excluding 'transformer_engine.pytorch.csrc' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.

              You can read more about "package data files" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html

              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************

      !!
        check.warn(importable)
      /opt/conda/lib/python3.11/site-packages/setuptools/command/build_py.py:220: _Warning: Package 'transformer_engine.pytorch.csrc.extensions' is absent from the `packages` configuration.
      !!

              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'transformer_engine.pytorch.csrc.extensions' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.

              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'transformer_engine.pytorch.csrc.extensions' is explicitly added
              to the `packages` configuration field.

              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).

              You can read more about "package discovery" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

              If you don't want 'transformer_engine.pytorch.csrc.extensions' to be distributed and are
              already explicitly excluding 'transformer_engine.pytorch.csrc.extensions' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.

              You can read more about "package data files" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html

              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************

      !!
        check.warn(importable)
      /opt/conda/lib/python3.11/site-packages/setuptools/command/build_py.py:220: _Warning: Package 'transformer_engine.pytorch.csrc.extensions.multi_tensor' is absent from the `packages` configuration.
      !!

              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'transformer_engine.pytorch.csrc.extensions.multi_tensor' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.

              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'transformer_engine.pytorch.csrc.extensions.multi_tensor' is explicitly added
              to the `packages` configuration field.

              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).

              You can read more about "package discovery" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

              If you don't want 'transformer_engine.pytorch.csrc.extensions.multi_tensor' to be distributed and are
              already explicitly excluding 'transformer_engine.pytorch.csrc.extensions.multi_tensor' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.

              You can read more about "package data files" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html

              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************

      !!
        check.warn(importable)
      /opt/conda/lib/python3.11/site-packages/setuptools/command/build_py.py:220: _Warning: Package 'transformer_engine.pytorch.csrc.userbuffers' is absent from the `packages` configuration.
      !!

              ********************************************************************************
              ############################
              # Package would be ignored #
              ############################
              Python recognizes 'transformer_engine.pytorch.csrc.userbuffers' as an importable package[^1],
              but it is absent from setuptools' `packages` configuration.

              This leads to an ambiguous overall configuration. If you want to distribute this
              package, please make sure that 'transformer_engine.pytorch.csrc.userbuffers' is explicitly added
              to the `packages` configuration field.

              Alternatively, you can also rely on setuptools' discovery methods
              (for example by using `find_namespace_packages(...)`/`find_namespace:`
              instead of `find_packages(...)`/`find:`).

              You can read more about "package discovery" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/package_discovery.html

              If you don't want 'transformer_engine.pytorch.csrc.userbuffers' to be distributed and are
              already explicitly excluding 'transformer_engine.pytorch.csrc.userbuffers' via
              `find_namespace_packages(...)/find_namespace` or `find_packages(...)/find`,
              you can try to use `exclude_package_data`, or `include-package-data=False` in
              combination with a more fine grained `package-data` configuration.

              You can read more about "package data files" on setuptools documentation page:

              - https://setuptools.pypa.io/en/latest/userguide/datafiles.html

              [^1]: For Python, any directory (with suitable naming) can be imported,
                    even if it does not contain any `.py` files.
                    On the other hand, currently there is no concept of package data
                    directory, all directories are treated like packages.
              ********************************************************************************

      !!
        check.warn(importable)
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc
      copying transformer_engine/pytorch/csrc/common.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc
      copying transformer_engine/pytorch/csrc/ts_fp8_op.cpp -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/activation.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/apply_rope.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/attention.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/cast.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/gemm.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/misc.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/normalization.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/padding.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/permutation.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/pybind.cpp -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/recipe.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/softmax.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      copying transformer_engine/pytorch/csrc/extensions/transpose.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions/multi_tensor
      copying transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_adam.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions/multi_tensor
      copying transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_l2norm_kernel.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions/multi_tensor
      copying transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_scale_kernel.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions/multi_tensor
      copying transformer_engine/pytorch/csrc/extensions/multi_tensor/multi_tensor_sgd_kernel.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/extensions/multi_tensor
      creating build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/userbuffers
      copying transformer_engine/pytorch/csrc/userbuffers/ipcsocket.cc -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/userbuffers
      copying transformer_engine/pytorch/csrc/userbuffers/userbuffers-host.cpp -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/userbuffers
      copying transformer_engine/pytorch/csrc/userbuffers/userbuffers.cu -> build/lib.linux-x86_64-cpython-311/transformer_engine/pytorch/csrc/userbuffers
      running build_ext
      Building CMake extension transformer_engine
      Running command /opt/conda/bin/cmake -S /tmp/pip-req-build-s96o7cy6/transformer_engine/common -B /tmp/pip-req-build-s96o7cy6/build/cmake -DPython_EXECUTABLE=/opt/conda/bin/python3.11 -DPython_INCLUDE_DIR=/opt/conda/include/python3.11 -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-s96o7cy6/build/lib.linux-x86_64-cpython-311 -DCMAKE_CUDA_ARCHITECTURES=70;80;89;90 -Dpybind11_DIR=/tmp/pip-req-build-s96o7cy6/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11 -GNinja
      -- The CUDA compiler identification is NVIDIA 12.4.131
      -- The CXX compiler identification is GNU 11.4.0
      -- Detecting CUDA compiler ABI info
      -- Detecting CUDA compiler ABI info - done
      -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
      -- Detecting CUDA compile features
      -- Detecting CUDA compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /usr/bin/c++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.4.131")
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      CMake Error at /tmp/pip-req-build-s96o7cy6/3rdparty/cudnn-frontend/cmake/cuDNN.cmake:3 (find_path):
        Could not find CUDNN_INCLUDE_DIR using the following files: cudnn.h
      Call Stack (most recent call first):
        CMakeLists.txt:40 (include)

      -- Configuring incomplete, errors occurred!
      Traceback (most recent call last):
        File "/tmp/pip-req-build-s96o7cy6/build_tools/build_ext.py", line 89, in _build_cmake
          subprocess.run(command, cwd=build_dir, check=True)
        File "/opt/conda/lib/python3.11/subprocess.py", line 571, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['/opt/conda/bin/cmake', '-S', '/tmp/pip-req-build-s96o7cy6/transformer_engine/common', '-B', '/tmp/pip-req-build-s96o7cy6/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python3.11', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-s96o7cy6/build/lib.linux-x86_64-cpython-311', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90', '-Dpybind11_DIR=/tmp/pip-req-build-s96o7cy6/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-req-build-s96o7cy6/setup.py", line 174, in <module>
          setuptools.setup(
        File "/opt/conda/lib/python3.11/site-packages/setuptools/__init__.py", line 117, in setup
          return distutils.core.setup(**attrs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 183, in setup
          return run_commands(dist)
                 ^^^^^^^^^^^^^^^^^^
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/core.py", line 199, in run_commands
          dist.run_commands()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 954, in run_commands
          self.run_command(cmd)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 999, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-s96o7cy6/setup.py", line 53, in run
          super().run()
        File "/opt/conda/lib/python3.11/site-packages/wheel/_bdist_wheel.py", line 378, in run
          self.run_command("build")
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 999, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/cmd.py", line 316, in run_command
          self.distribution.run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/dist.py", line 999, in run_command
          super().run_command(command)
        File "/opt/conda/lib/python3.11/site-packages/setuptools/_distutils/dist.py", line 973, in run_command
          cmd_obj.run()
        File "/tmp/pip-req-build-s96o7cy6/build_tools/build_ext.py", line 119, in run
          ext._build_cmake(
        File "/tmp/pip-req-build-s96o7cy6/build_tools/build_ext.py", line 91, in _build_cmake
          raise RuntimeError(f"Error when running CMake: {e}")
      RuntimeError: Error when running CMake: Command '['/opt/conda/bin/cmake', '-S', '/tmp/pip-req-build-s96o7cy6/transformer_engine/common', '-B', '/tmp/pip-req-build-s96o7cy6/build/cmake', '-DPython_EXECUTABLE=/opt/conda/bin/python3.11', '-DPython_INCLUDE_DIR=/opt/conda/include/python3.11', '-DCMAKE_BUILD_TYPE=Release', '-DCMAKE_INSTALL_PREFIX=/tmp/pip-req-build-s96o7cy6/build/lib.linux-x86_64-cpython-311', '-DCMAKE_CUDA_ARCHITECTURES=70;80;89;90', '-Dpybind11_DIR=/tmp/pip-req-build-s96o7cy6/.eggs/pybind11-2.13.6-py3.11.egg/pybind11/share/cmake/pybind11', '-GNinja']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for transformer_engine
  Running setup.py clean for transformer_engine
  Building wheel for flash-attn (setup.py) ... done
  Created wheel for flash-attn: filename=flash_attn-2.6.3-cp311-cp311-linux_x86_64.whl size=187328293 sha256=25405479af3f6865c873ee3bbdfadcfccea9055355de78e7cd6b93170e9d4377
  Stored in directory: /root/.cache/pip/wheels/e3/ef/b1/7889928ffa2dea61032e61480db4e4c20d00a9d9e28cd4f55a
Successfully built flash-attn
Failed to build transformer_engine
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (transformer_engine)
timmoon10 commented 5 days ago

@s-smits CMake can't find cuDNN. Please set CUDNN_PATH in the environment. See https://github.com/NVIDIA/TransformerEngine/issues/355#issuecomment-2394353816 for more guidance on common build errors.