Open taffy-miao opened 1 year ago
The errors are below: Collecting flash-attn==1.0.1 Using cached flash_attn-1.0.1.tar.gz (1.9 MB) Preparing metadata (setup.py) ... done Requirement already satisfied: torch in ./miniconda3/envs/scgpt/lib/python3.7/site-packages (from flash-attn==1.0.1) (1.13.0+cu117) Collecting einops (from flash-attn==1.0.1) Using cached einops-0.6.1-py3-none-any.whl (42 kB) Requirement already satisfied: typing-extensions in ./miniconda3/envs/scgpt/lib/python3.7/site-packages (from torch->flash-attn==1.0.1) (4.7.1) Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... error error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [348 lines of output]
torch.__version__ = 1.13.0+cu117
fatal: not a git repository (or any of the parent directories): .git
running bdist_wheel
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-37
creating build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton_og.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton_varlen.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_interface.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/fused_softmax.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton_tmp_og.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_blocksparse_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/attention_kernl.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/bert_padding.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton_tmp.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/rotary.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_blocksparse_attn_interface.py -> build/lib.linux-x86_64-cpython-37/flash_attn
copying flash_attn/flash_attn_triton_single_query.py -> build/lib.linux-x86_64-cpython-37/flash_attn
creating build/lib.linux-x86_64-cpython-37/flash_attn/losses
copying flash_attn/losses/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
copying flash_attn/losses/cross_entropy.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
copying flash_attn/losses/cross_entropy_parallel.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
copying flash_attn/losses/cross_entropy_apex.py -> build/lib.linux-x86_64-cpython-37/flash_attn/losses
creating build/lib.linux-x86_64-cpython-37/flash_attn/layers
copying flash_attn/layers/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
copying flash_attn/layers/patch_embed.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
copying flash_attn/layers/rotary.py -> build/lib.linux-x86_64-cpython-37/flash_attn/layers
creating build/lib.linux-x86_64-cpython-37/flash_attn/ops
copying flash_attn/ops/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
copying flash_attn/ops/layer_norm.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
copying flash_attn/ops/fused_dense.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
copying flash_attn/ops/rms_norm.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
copying flash_attn/ops/gelu_activation.py -> build/lib.linux-x86_64-cpython-37/flash_attn/ops
creating build/lib.linux-x86_64-cpython-37/flash_attn/modules
copying flash_attn/modules/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
copying flash_attn/modules/block.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
copying flash_attn/modules/mha.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
copying flash_attn/modules/embedding.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
copying flash_attn/modules/mlp.py -> build/lib.linux-x86_64-cpython-37/flash_attn/modules
creating build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/vit.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/gptj.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/gpt_neox.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/gpt.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/opt.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/gpt_j.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
copying flash_attn/models/bert.py -> build/lib.linux-x86_64-cpython-37/flash_attn/models
creating build/lib.linux-x86_64-cpython-37/flash_attn/triton
copying flash_attn/triton/fused_attention.py -> build/lib.linux-x86_64-cpython-37/flash_attn/triton
copying flash_attn/triton/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/triton
creating build/lib.linux-x86_64-cpython-37/flash_attn/utils
copying flash_attn/utils/distributed.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
copying flash_attn/utils/benchmark.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
copying flash_attn/utils/__init__.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
copying flash_attn/utils/generation.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
copying flash_attn/utils/pretrained.py -> build/lib.linux-x86_64-cpython-37/flash_attn/utils
running build_ext
building 'flash_attn_cuda' extension
creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37
creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc
creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn
creating /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src
Emitting ninja build file /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim128.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[2/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim32.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[3/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_fwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_hdim64.cu:5:
/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_fwd_launch_template.h:8:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[4/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim32.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim32.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[5/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu:28:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[6/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim64.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim64.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[7/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_block_dgrad_fp16_kernel_loop.sm80.cu:4:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[8/9] /usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o
/usr/local/cuda/bin/nvcc -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/src/fmha_bwd_hdim128.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha.h:39,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_launch_template.h:6,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src/fmha_bwd_hdim128.cu:5:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
#include <cuda_bf16.h>
^~~~~~~~~~~~~
compilation terminated.
[9/9] c++ -MMD -MF /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o.d -pthread -B /public/home/fyg/miniconda3/envs/scgpt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
FAILED: /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o
c++ -MMD -MF /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o.d -pthread -B /public/home/fyg/miniconda3/envs/scgpt/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/src -I/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/cutlass/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/TH -I/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/public/home/fyg/miniconda3/envs/scgpt/include/python3.7m -c -c /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp -o /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/build/temp.linux-x86_64-cpython-37/csrc/flash_attn/fmha_api.o -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/local/cuda/include/cublas_v2.h:65,
from /public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/include/ATen/cuda/CUDAContext.h:7,
from /tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/csrc/flash_attn/fmha_api.cpp:30:
/usr/local/cuda/include/cublas_api.h:76:10: fatal error: cuda_bf16.h: No such file or directory
76 | #include <cuda_bf16.h>
| ^~~~~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1906, in _run_ninja_build
env=env)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/subprocess.py", line 512, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<string>", line 36, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-ywwm9on5/flash-attn_baa773d093a047598c593020a9e777d2/setup.py", line 185, in <module>
"einops",
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/__init__.py", line 107, in setup
return distutils.core.setup(**attrs)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/wheel/bdist_wheel.py", line 343, in run
self.run_command("build")
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build.py", line 131, in run
self.run_command(cmd_name)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/cmd.py", line 318, in run_command
self.distribution.run_command(command)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/dist.py", line 1234, in run_command
super().run_command(command)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 84, in run
_build_ext.run(self)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 345, in run
self.build_extensions()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 843, in build_extensions
build_ext.build_extensions(self)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 467, in build_extensions
self._build_extensions_serial()
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 493, in _build_extensions_serial
self.build_extension(ext)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 246, in build_extension
_build_ext.build_extension(self, ext)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/setuptools/_distutils/command/build_ext.py", line 555, in build_extension
depends=ext.depends,
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 668, in unix_wrap_ninja_compile
with_cuda=with_cuda)
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1578, in _write_ninja_file_and_compile_objects
error_prefix='Error compiling objects for extension')
File "/public/home/fyg/miniconda3/envs/scgpt/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1916, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
Probably something to do with setting the right location of $CUDA_HOME so the install script can find the included header files. Idk how it works with your local setting. We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.
I tried with the Pytorch container recommended and find the exact same issue
venv/lib/python3.8/site-packages/torch/include/c10/util/BFloat16.h:11:10: fatal error: cuda_bf16.h: No such file or directory
把整个环境删了重装,推测是里有脏东西 看看site-package或者lib下面有没有多余的flash-attention,或者查一下~/.local下面是不是脏了
Hi @tridao I've encountered a question, My cuda version is 11.7,and
nvcc -V
is release 11.7, V11.7.64 ,and the bashrc dir is export CUDA_HOME="/usr/local/cuda" export PATH=/usr/local/cuda-11.7/bin:$PATH export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH export CPATH=/usr/local/cuda-11.7/targets/x86_64-linux/include:$CPATHbut when I ran
pip install flash-attn==1.0.1
,it still getting many errors