Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.82k stars 1.28k forks source link

build fails on cuda 12.2 system #692

Open jdgh000 opened 10 months ago

jdgh000 commented 10 months ago

I am seeing following...

python3 setup.py develop 

torch.__version__  = 2.2.0a0+gitbbd5b93

running develop
running egg_info
writing flash_attn.egg-info/PKG-INFO
writing dependency_links to flash_attn.egg-info/dependency_links.txt
writing requirements to flash_attn.egg-info/requires.txt
writing top-level names to flash_attn.egg-info/top_level.txt
reading manifest file 'flash_attn.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.cu' under directory 'flash_attn'
warning: no files found matching '*.h' under directory 'flash_attn'
warning: no files found matching '*.cuh' under directory 'flash_attn'
warning: no files found matching '*.cpp' under directory 'flash_attn'
warning: no files found matching '*.hpp' under directory 'flash_attn'
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'flash_attn.egg-info/SOURCES.txt'
running build_ext
building 'flash_attn_2_cuda' extension
Emitting ninja build file /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/c
....
....

[8/49] /usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
FAILED: /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o
/usr/local/cuda-12.2/bin/nvcc --generate-dependencies-with-compile --dependency-output /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o.d -I/root/gg/git/flash-attention/csrc/flash_attn -I/root/gg/git/flash-attention/csrc/flash_attn/src -I/root/gg/git/flash-attention/csrc/cutlass/include -I/root/gg/git/pytorch/torch/include -I/root/gg/git/pytorch/torch/include/torch/csrc/api/include -I/root/gg/git/pytorch/torch/include/TH -I/root/gg/git/pytorch/torch/include/THC -I/usr/local/cuda-12.2/include -I/usr/include/python3.9 -c -c /root/gg/git/flash-attention/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.cu -o /root/gg/git/flash-attention/build/temp.linux-x86_64-3.9/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=1
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2102, in _run_ninja_build
    subprocess.run(
  File "/usr/lib64/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/root/gg/git/flash-attention/setup.py", line 288, in <module>
    setup(
  File "/usr/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib64/python3.9/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib64/python3.9/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/usr/lib/python3.9/site-packages/setuptools/command/develop.py", line 136, in install_for_development
    self.run_command('build_ext')
  File "/usr/lib64/python3.9/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/usr/lib64/python3.9/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 873, in build_extensions
    build_ext.build_extensions(self)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
    self._build_extensions_serial()
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
    self.build_extension(ext)
  File "/usr/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 196, in build_extension
    _build_ext.build_extension(self, ext)
  File "/usr/lib64/python3.9/distutils/command/build_ext.py", line 529, in build_extension
    objects = self.compiler.compile(sources,
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 686, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/root/gg/git/pytorch/torch/utils/cpp_extension.py", line 2118, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[root@guyen-MS-7B22 git]#
[root@guyen-MS-7B22 git]# git remtoe -
git: 'remtoe' is not a git command. See 'git --help'.

The most similar command is
        remote
[root@guyen-MS-7B22 git]# git remote -
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# git remote -v
fatal: not a git repository (or any of the parent directories): .git
[root@guyen-MS-7B22 git]# cd flash-attention/
[root@guyen-MS-7B22 flash-attention]# git remote -v
origin  https://github.com/Dao-AILab/flash-attention.git (fetch)
origin  https://github.com/Dao-AILab/flash-attention.git (push)
[root@guyen-MS-7B22 flash-attention]# cat /etc/os-release ; uname -r
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
5.19.0-38-generic
tridao commented 10 months ago

We have prebuilt CUDA wheels that will be downloaded if you install with pip install flash-attn --no-build-isolation. Then you wouldn't have to compile things yourself.

ggghamd commented 10 months ago

yeah, I saw it, however can you help building issues as my environment specifically demands building it manually... Any stable release branch where build is also reliable?

tridao commented 10 months ago

Environments are so different it's hard to know, and I'm not an expert on compiling or building. There was no obvious error message pointing to a line in your log.

I use nvidia's Pytorch docker image which has all the libraries and compilers ready.

You can try limiting MAX_JOBS=4 as mentioned in the README in case it failed because of OOM.

jdgh000 commented 10 months ago

MAX_JOBS=4 failed with similar error, i dont believe it is OOM.

tridao commented 10 months ago

Yeah then idk how to fix.

jdgh000 commented 10 months ago

hmm, is there a way you can fwd to someone who can? If someone here can help, where else can get help?

PKR-808 commented 7 months ago

I am getting the same error with H100 GPUs. I have tried all the different installation methods and right now, I am trying with a fresh conda environment. Still, I get this error(truncated)

      _run_ninja_build(
    File "/miniconda3/envs/pytorch_cuda/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build
      raise RuntimeError(message) from e
  RuntimeError: Error compiling objects for extension
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

@tridao any idea?

jdgh000 commented 6 months ago

As of today, build starts ok but takes forever any idea??

/usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0