How can I install a feaible Flash-Attention version on my Turing GPU?

eileen2003-w commented 2 months ago

I have read the text and found that I have to install the flash-attn1.x to fit my Turing GPU, so I get the source package from github: https://github.com/Dao-AILab/flash-attention/releases?page=6. Then I download the version of 1.0.2 using "python setup.py install" to install. But I met a problem that I don't know how to solve. My GPU:NVIDIA TITAN RTX My cuda version:11.7 My torch version:2.0.1+cu117 My gcc version:gcc (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0 My ninja vesion:1.11.1.1 the error information: FAILED: /home/tony/Downloads/flash-attention-1.0.2/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o /usr/local/cuda-11.7/bin/nvcc -I/home/tony/Downloads/flash-attention-1.0.2/csrc/flash_attn -I/home/tony/Downloads/flash-attention-1.0.2/csrc/flash_attn/src -I/home/tony/Downloads/flash-attention-1.0.2/csrc/flash_attn/cutlass/include -I/home/tony/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/include -I/home/tony/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/include/torch/csrc/api/include -I/home/tony/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/include/TH -I/home/tony/anaconda3/envs/intern_clean/lib/python3.9/site-packages/torch/include/THC -I/usr/local/cuda-11.7/include -I/home/tony/anaconda3/envs/intern_clean/include/python3.9 -c -c /home/tony/Downloads/flash-attention-1.0.2/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.cu -o /home/tony/Downloads/flash-attention-1.0.2/build/temp.linux-x86_64-cpython-39/csrc/flash_attn/src/fmha_block_fprop_fp16_kernel.sm80.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math --ptxas-options=-v -lineinfo -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=flash_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 gcc: fatal error: cannot execute ‘cc1plus’: execvp: 没有那个文件或目录 compilation terminated. nvcc fatal : Failed to preprocess host compiler properties. ninja: build stopped: subcommand failed. ..... RuntimeError: Error compiling objects for extension

foreverlms commented 2 months ago

This is clearly compilation issue and you can search on google with gcc: fatal error: cannot execute ‘cc1plus’: execvp to get some answers to try.

eileen2003-w commented 2 months ago

Thank you for your anwser~

Dao-AILab / flash-attention

How can I install a feaible Flash-Attention version on my Turing GPU? #1132