Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.72k stars 1.26k forks source link

ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects #304

Open taffy-miao opened 1 year ago

taffy-miao commented 1 year ago

Hi @tridao I've encountered a question, My cuda version is 11.7,and nvcc -V is release 11.7, V11.7.64 ,and the bashrc dir is export CUDA_HOME="/usr/local/cuda" export PATH=/usr/local/cuda-11.7/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH export CPATH=/usr/local/cuda-11.7/targets/x86_64-linux/include:$CPATH

but when I ran pip install flash-attn==1.0.1,it still getting many errors

taffy-miao commented 1 year ago

The errors are below: Collecting flash-attn==1.0.1 Using cached flash_attn-1.0.1.tar.gz (1.9 MB) Preparing metadata (setup.py) ... done Requirement already satisfied: torch in ./miniconda3/envs/scgpt/lib/python3.7/site-packages (from flash-attn==1.0.1) (1.13.0+cu117) Collecting einops (from flash-attn==1.0.1) Using cached einops-0.6.1-py3-none-any.whl (42 kB) Requirement already satisfied: typing-extensions in ./miniconda3/envs/scgpt/lib/python3.7/site-packages (from torch->flash-attn==1.0.1) (4.7.1) Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [348 lines of output]

  torch.__version__  = 1.13.0+cu117

  fatal: not a git repository (or any of the parent directories): .git
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-cpython-37
  creating build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-37/flash_attn
  creating build/lib.linux-x86_64-cpython-37/flash_attn/losses
  copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
  copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
  copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
  copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
  creating build/lib.linux-x86_64-cpython-37/flash_attn/layers
  copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
  copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
  copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
  creating build/lib.linux-x86_64-cpython-37/flash_attn/ops
  copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
  copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
  copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
  copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
  copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
  creating build/lib.linux-x86_64-cpython-37/flash_attn/modules
  copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
  copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
  copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
  copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
  copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
  creating build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
  creating build/lib.linux-x86_64-cpython-37/flash_attn/triton
  copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn/triton
  copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/triton
  creating build/lib.linux-x86_64-cpython-37/flash_attn/utils
  copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
  copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
  copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
  copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
  copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
  running build_ext
  building 'flash_attn_cuda' extension
  creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37
  creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc
  creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn
  creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src
  Emitting ninja build file /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/build.ninja...
  Compiling objects...
  Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
  [1/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [2/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [3/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
  /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [4/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [5/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [6/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [7/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [8/9] /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o
  /usr/local/cuda/bin/nvcc  -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
   #include <cuda_bf16.h>
            ^~~~~~~~~~~~~
  compilation terminated.
  [9/9] c++ -MMD -MF /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o.d -pthread -B /public/home/fyg/miniconda3/envs/scgpt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o
  c++ -MMD -MF /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o.d -pthread -B /public/home/fyg/miniconda3/envs/scgpt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
  cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
  In file included from /usr/local/cuda/include/cublas_v2.h:65,
                   from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
                   from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp:30:
  /usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
     76 | #include <cuda_bf16.h>
        |          ^~~~~~~~~~~~~
  compilation terminated.
  ninja: build stopped: subcommand failed.
  Traceback (most recent call last):
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
      env=env)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/subprocess.py", line 512, in run
      output=stdout, stderr=stderr)
  subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

  The above exception was the direct cause of the following exception:

  Traceback (most recent call last):
    File "<string>", line 36, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/setup.py", line 185, in <module>
      "einops",
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/__init__.py", line 107, in setup
      return distutils.core.setup(**attrs)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 343, in run
      self.run_command("build")
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 131, in run
      self.run_command(cmd_name)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
      self.distribution.run_command(command)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
      super().run_command(command)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
      _build_ext.run(self)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
      self.build_extensions()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
      build_ext.build_extensions(self)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
      self._build_extensions_serial()
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
      self.build_extension(ext)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
      _build_ext.build_extension(self, ext)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 555, in build_extension
      depends=ext.depends,
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 668, in unix_wrap_ninja_compile
      with_cuda=with_cuda)
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1578, in _write_ninja_file_and_compile_objects
      error_prefix='Error compiling objects for extension')
    File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

tridao commented 1 year ago

Probably something to do with setting the right location of $CUDA_HOME so the install script can find the included header files. Idk how it works with your local setting. We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.

ivallesp commented 1 year ago

I tried with the Pytorch container recommended and find the exact same issue

venv/lib/python3.8/site-packages/torch/include/c10/util/BFloat16.h:11:10: fatal error: cuda_bf16.h: No such file or directory

bio-punk commented 9 months ago

把整个环境删了重装,推测是里有脏东西 看看site-package或者lib下面有没有多余的flash-attention,或者查一下~/.local下面是不是脏了