Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.69k stars 1.26k forks source link

error: command '/opt/conda/bin/nvcc' failed with exit code 255 #298

Open Martion-z opened 1 year ago

Martion-z commented 1 year ago

I had such issue. And the version of nvcc is 11.7, gcc 11.1

Building wheel for flash-attn (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [115 lines of output]

  torch.__version__  = 1.13.1+cu117

  fatal: not a git repository (or any of the parent directories): .git
  running bdist_wheel
  /opt/conda/lib/python3.8/site-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
    warnings.warn(msg.format('we could not find ninja.'))
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-38
  creating build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-38/flash_attn
  creating build/lib.linux-x86_64-cpython-38/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-38/flash_attn/layers
  creating build/lib.linux-x86_64-cpython-38/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses
  copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses
  copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-38/flash_attn/losses
  creating build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/llama.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-38/flash_attn/models
  creating build/lib.linux-x86_64-cpython-38/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-38/flash_attn/modules
  creating build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/activations.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-38/flash_attn/ops
  creating build/lib.linux-x86_64-cpython-38/flash_attn/triton
  copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/triton
  copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-38/flash_attn/triton
  creating build/lib.linux-x86_64-cpython-38/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-38/flash_attn/utils
  running build_ext
  building 'flash_attn_cuda' extension
  creating build/temp.linux-x86_64-cpython-38
  creating build/temp.linux-x86_64-cpython-38/csrc
  creating build/temp.linux-x86_64-cpython-38/csrc/flash_attn
  creating build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src
  gcc -pthread -B /opt/conda/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/cutlass/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/opt/conda/include -I/opt/conda/include/python3.8 -c csrc/flash_attn/fmha_api.cpp -o build/temp.linux-x86_64-cpython-38/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  cc1plus: warning: command-line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  In file included from /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha.h:42,
                   from csrc/flash_attn/fmha_api.cpp:33:
  /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h: In function ‘void set_alpha(uint32_t&, float, Data_type)’:
  /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:63:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     63 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:68:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     68 |         alpha = reinterpret_cast<const uint32_t &>( h2 );
        |                                                     ^~
  /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha_utils.h:70:53: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
     70 |         alpha = reinterpret_cast<const uint32_t &>( norm );
        |                                                     ^~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘void set_params_fprop(FMHA_fprop_params&, size_t, size_t, size_t, size_t, size_t, at::Tensor, at::Tensor, at::Tensor, at::Tensor, void*, void*, void*, void*, void*, float, float, bool, int)’:
  csrc/flash_attn/fmha_api.cpp:64:11: warning: ‘void* memset(void*, int, size_t)’ clearing an object of non-trivial type ‘struct FMHA_fprop_params’; use assignment or value-initialization instead [-Wclass-memaccess]
     64 |     memset(&params, 0, sizeof(params));
        |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
  In file included from csrc/flash_attn/fmha_api.cpp:33:
  /tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src/fmha.h:75:8: note: ‘struct FMHA_fprop_params’ declared here
     75 | struct FMHA_fprop_params : public Qkv_params {
        |        ^~~~~~~~~~~~~~~~~
  csrc/flash_attn/fmha_api.cpp:60:15: warning: unused variable ‘acc_type’ [-Wunused-variable]
     60 |     Data_type acc_type = DATA_TYPE_FP32;
        |               ^~~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd(const at::Tensor&, const at::Tensor&, const at::Tensor&, at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, bool, int, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:208:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    208 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  csrc/flash_attn/fmha_api.cpp: In function ‘std::vector<at::Tensor> mha_fwd_block(const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, const at::Tensor&, int, int, float, float, bool, bool, c10::optional<at::Generator>)’:
  csrc/flash_attn/fmha_api.cpp:533:10: warning: unused variable ‘is_sm80’ [-Wunused-variable]
    533 |     bool is_sm80 = dprops->major == 8 && dprops->minor == 0;
        |          ^~~~~~~
  /opt/conda/bin/nvcc -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/src -I/tmp/pip-install-tcki_6d6/flash-attn_22991b13d7124b4babf3995fda5fae0b/csrc/flash_attn/cutlass/include -I/opt/conda/lib/python3.8/site-packages/torch/include -I/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/lib/python3.8/site-packages/torch/include/THC -I/opt/conda/include -I/opt/conda/include/python3.8 -c csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o build/temp.linux-x86_64-cpython-38/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  Command-line error: invalid option: --orig_src_path_name

  1 catastrophic error detected in this compilation.
  Compilation terminated.
  error: command '/opt/conda/bin/nvcc' failed with exit code 255
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

tridao commented 1 year ago

Can you try with gcc 10?

Martion-z commented 1 year ago

image I try with gcc 10 and get the same issue

tridao commented 1 year ago

What's your nvcc version?

Martion-z commented 1 year ago

What's your nvcc version?

root@train-733924-worker-0:/usr/bin# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0

tridao commented 1 year ago

All look reasonable, I've no idea why it fails. We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.

14cnewman commented 1 year ago

Did you ever figure this out? I'm getting the same error

whitelok commented 11 months ago

What's your nvcc version?

root@train-733924-worker-0:/usr/bin# nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Tue_May__3_18:49:52_PDT_2022 Cuda compilation tools, release 11.7, V11.7.64 Build cuda_11.7.r11.7/compiler.31294372_0

you could run /opt/conda/bin/nvcc --version to check the real cuda code compiler.