Open jdgh000 opened 10 months ago
We have prebuilt CUDA wheels that will be downloaded if you install with pip install flash-attn --no-build-isolation
. Then you wouldn't have to compile things yourself.
yeah, I saw it, however can you help building issues as my environment specifically demands building it manually... Any stable release branch where build is also reliable?
Environments are so different it's hard to know, and I'm not an expert on compiling or building. There was no obvious error message pointing to a line in your log.
I use nvidia's Pytorch docker image which has all the libraries and compilers ready.
You can try limiting MAX_JOBS=4
as mentioned in the README in case it failed because of OOM.
MAX_JOBS=4 failed with similar error, i dont believe it is OOM.
Yeah then idk how to fix.
hmm, is there a way you can fwd to someone who can? If someone here can help, where else can get help?
I am getting the same error with H100 GPUs. I have tried all the different installation methods and right now, I am trying with a fresh conda environment. Still, I get this error(truncated)
_run_ninja_build(
File "/miniconda3/envs/pytorch_cuda/lib/python3.12/site-packages/torch/utils/cpp_extension.py", line 2112, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects
@tridao any idea?
As of today, build starts ok but takes forever any idea??
/usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim128_fp16_sm80.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /usr/local/cuda-12.3/bin/nvcc -I/root/extdir/gg/git/flash-attention/csrc/flash_attn -I/root/extdir/gg/git/flash-attention/csrc/flash_attn/src -I/root/extdir/gg/git/flash-attention/csrc/cutlass/include -I/miniconda3/lib/python3.11/site-packages/torch/include -I/miniconda3/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -I/miniconda3/lib/python3.11/site-packages/torch/include/TH -I/miniconda3/lib/python3.11/site-packages/torch/include/THC -I/usr/local/cuda-12.3/include -I/miniconda3/include/python3.11 -c csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.cu -o build/temp.linux-x86_64-cpython-311/csrc/flash_attn/src/flash_fwd_split_hdim160_bf16_sm80.o -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr --compiler-options '-fPIC' -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0
I am seeing following...