Open MilesQLi opened 1 year ago
Same issue for me.
Workaround: install the previous version pip install flash_attn==1.0.5
I am seeing the same problem on every flash_attn
version. I am using Cuda 12.1 on the new g2 vm instance from gcp. https://cloud.google.com/compute/docs/accelerator-optimized-machines#g2-vms. The underlying GPU is the Nvidia L4 which uses Ada.
Workaround: install the previous version pip install flash_attn==1.0.5
This might work in some scenarios but not all.
Can you try python -m pip install flash-attn
?
It's possible that pip
and python -m pip
refer to different environments.
Getting the dependencies right for all setup is hard. We had torch
in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. So for 1.0.6 we leave torch out of the dependency.
Getting the same issue. I also tried python -m pip install flash-attn
as you suggested with the same failure.
same problem here.
I don't know a right solution that works for all setups, happy to hear suggestions.
We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.
I believe this is an incompatibility issue with cuda 12.1 version of torch.
Using the following torch version solves my probem.
torch==2.0.0+cu117
@smeyerhot I use the exact version, but it doesn't work. See the screenshot.
@MilesQLi
I believe this is an incompatibility issue with cuda 12.1 version of torch.
Using the following torch version solves my probem.
torch==2.0.0+cu117
Sorry! This didn't fix things... apologies on the false hope.
@smeyerhot No problem. Thanks a lot anyway!
same problem
same problem
pip install flash-attn==1.0.5
might help. I am using torch 1.13 and cuda 12.0.
I had the same issue with pip. Workaround was to compile from source, worked as a charm
In [1]: import flash_attn
In [2]: import torch
In [3]: torch.__version__
Out[3]: '2.0.1+cu117'
In [4]: flash_attn.__version__
Out[4]: '1.0.6'
I had the same issue with pip. Workaround was to compile from source, worked as a charm
In [1]: import flash_attn In [2]: import torch In [3]: torch.__version__ Out[3]: '2.0.1+cu117' In [4]: flash_attn.__version__ Out[4]: '1.0.6'
I also had the same issue, but my system needs Cuda 12.1 (2x Nvidia L4). so using torch 117 is not an option.
this is also my workaround and it works like a charm.
my system uses Fedora Server
I compiled it myself using a docker container and I still get this when executing
RuntimeError: Expected q_dtype == torch::kFloat16 || ((is_sm8x || is_sm90) && q_dtype == torch::kBFloat16) to be true
try pip install flash-attn --no-build-isolation
fixed my problem.
pip docs
to fix this problem, maybe adding torch dependency into pyproject.toml can help
@xwyzsn Unfortunately this only worked on my windows system, not linux. But I feel we're making progress.
to fix this problem, maybe adding torch dependency into pyproject.toml can help
We had torch in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. I'm not really an expert in Python packaging, so it's possible I'm doing sth wrong.
@xwyzsn Unfortunately this only worked on my windows system, not linux. But I feel we're making progress.
Hi, actually I am using linux. It also worked well. I assume that you may missed some other package to build this up in your linux system.
--no-build-isolation
...Build dependencies specified by PEP 518 must be already installed if this option is used.
same problem to me, i solve by check my device and torch cuda version.
try
pip install flash-attn --no-build-isolation
fixed my problem. pip docs to fix this problem, maybe adding torch dependency into pyproject.toml can help
This fixed the torch problem, but now I got an other error. Might be related to something else tho.
435 | function(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’
/usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
530 | operator=(_Functor&& __f)
| ^
/usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’
error: command '/usr/bin/nvcc' failed with exit code 255
@xwyzsn ninja was removed, then torch was removed, then ninja was re-added. Next logical step is to re-add torch. right??? 😄
Same issue in Kubuntu 20 with torch 2.0.1 and cuda 11.8 python 3.9 / python 3.10 and flash-attn versions 0.2.8 / 1.0.4 / 1.0.5 / 1.0.6 / 1.0.7 with and without --no-build-isolation flag.
Thanks to the previous answers, I can install it successfully. Here is my experience: Environment torchv2.0.0 + cuda11.7 on Ubuntu
ModuleNotFoundError: No module named 'torch'
, then I install as pip install flash-attn --no-build-isolation
ModuleNotFoundError: No module named 'packaging'
, then I install this package as pip install packaging
RuntimeError: The current installed version of g++ (4.8.5) is less than the minimum required version by CUDA 11.7 (6.0.0). Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).
Workaround: install the previous version pip install flash_attn==1.0.5
how do tackle this
I had the same issue with pip. Workaround was to compile from source, worked as a charm
In [1]: import flash_attn In [2]: import torch In [3]: torch.__version__ Out[3]: '2.0.1+cu117' In [4]: flash_attn.__version__ Out[4]: '1.0.6'
I also had the same issue, but my system needs Cuda 12.1 (2x Nvidia L4). so using torch 117 is not an option.
this is also my workaround and it works like a charm.
my system uses Fedora Server
What was your solution for cuda 12 and L4 gpu?
I got the same issue here. I only was able to build from the source (clone the repo then run python setup.py install
). pip install git+https://github.com/HazyResearch/flash-attention
also give me the same error. I'm using torch==1.12.1+cu113.
try
pip install flash-attn --no-build-isolation
fixed my problem. pip docs to fix this problem, maybe adding torch dependency into pyproject.toml can helpThis fixed the torch problem, but now I got an other error. Might be related to something else tho.
435 | function(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:435:145: note: ‘_ArgTypes’ /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’: 530 | operator=(_Functor&& __f) | ^ /usr/include/c++/11/bits/std_function.h:530:146: note: ‘_ArgTypes’ error: command '/usr/bin/nvcc' failed with exit code 255
I had the same issue. Could you solve this?
@Martion-z looks the same as #172 and #225. Some version of CUDA doesn't like gcc 11. Downgrading to gcc 10 might work.
conda install -c conda-forge cudatoolkit-dev
pip flash_attn==1.0.5
This woked for me.
pip install flash-attn==1.0.5
Thanks! This solves the above error. But there's still a new one occurs: The detected CUDA version (12.1) mismatches the version that was used to compile.
pip install flash-attn==1.0.5
Thanks! This solves the above error. But there's still a new one occurs: The detected CUDA version (12.1) mismatches the version that was used to compile.
Problem solved. I installed the nightly version of pytorch, and then install flash-attn with the no-build-isolation option.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
# Successfully installed torch-2.1.0.dev20230710+cu121
pip install flash-attn --no-build-isolation
# Successfully built flash-attn
# Installing collected packages: ninja, flash-attn
# Successfully installed flash-attn-1.0.8 ninja-1.11.1
Note the wheel building process takes a long time. Don't kill it and just wait.
There are a lot of things that can go wrong when installing this package. Next I'm going to share a recipe that should work right now using conda.
Some remarks:
Create a new environment if you don't have already one.
conda create -n flash_attn python=3.10.11
conda activate flash_attn
These are the required packages with their required versions.
conda install -c conda-forge gcc=11.3
conda install -c conda-forge gxx=11.3
conda install cuda -c nvidia/label/cuda-11.8.0
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install packaging
pip install flash_attn --no-build-isolation
There are a lot of things that can go wrong when installing this package. Next I'm going to share a recipe that should work right now using conda.
Some remarks:
- Don't use pip to install any cuda or pytorch libraries (this include any package that might reference them before having them installed via conda)
- Make sure you don't have several versions of CUDA installed, if that is your case the installation might fail.
- Try to install ninja from your distribution's package manager. The process might work without having it installed but it does a much better job detecting the environment.
- This work with the current torch version and cuda 11.8 no warranties that it will keep working for future versions.
- If you couldn't install it before I strongly recommend to start from a fresh conda environment.
Create a new environment if you don't have already one.
conda create -n flash_attn python=3.10.11 conda activate flash_attn
These are the required packages with their required versions.
conda install -c conda-forge gcc=11.3 conda install -c conda-forge gxx=11.3 conda install cuda -c nvidia/label/cuda-11.8.0 conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia pip install packaging pip install flash_attn --no-build-isolation
Thanks for your kind and detailed recipe! But I still meet this problem...
Collecting flash_attn
Using cached flash_attn-1.0.9.tar.gz (1.8 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [21 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-57_5gf64/flash-attn_939101cdc95c431a947f582f325cfb21/setup.py", line 111, in <module>
_, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
File "/tmp/pip-install-57_5gf64/flash-attn_939101cdc95c431a947f582f325cfb21/setup.py", line 26, in get_cuda_bare_metal_version
raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.5/bin/nvcc'
torch.__version__ = 2.0.1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
@jinghan23 See point two, you have several versions of CUDA installed and the installation is failing because of that.
The solution I've found working :
cuda 11.7
-> follow instruction here
Example :
$ wget -d https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb
$ sudo dpkg -i cuda-keyring_1.0-1_all.deb
$ sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/3bf863cc.pub
$ add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /"
$ add-apt-repository contrib
$ sudo apt-get update
$ sudo apt-get -y install cuda-11-7
g++-10
and gcc-10
Install flash-attn like this : CXX=g++-10 CC=gcc-10 LD=g++-10 pip3 install flash-attn==v1.0.3.post0
I've not tried on other version of flash-attn, but I think it should works too
The workaround that worked for me was to downgrade the cuda runtime version. My driver version is still 12.2 but the runtime version is now 11.7. It is also much faster than installing with no build isolation.
conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
pip3 install flash-attn==1.0.5
flash-attn is a very problematic library
you MUST install flash-atten after torch been installed
Solution: Install PyTorch first, then install FA2.
Example:
WRONG:
$ pip install torch flash-attn
CORRECT:
$ pip install torch
$ pip install flash-attn
(source)
I've tried adding torch also as a build and not just a runtime dependency in my pyproject.toml. It doubled the install time, but actually ended up not working.
My workaround now is to have it as an optional dependency:
[project.optional-dependencies]
# Flash attention cannot be installed alongside normal dependencies,
# since it requires torch during build time. Install with
# pip install '.[flash-attn]'
# after installing everything else first.
flash-attn = [
"flash-attn>=2.5.7"
]
And then do two pip installs. One without [flash-attn]
and then one with [flash-attn]
.
It's also worth noting that flash-attn is extremely picky when it comes to the pip and wheel versions. With the following build requirements:
[build-system]
requires = ["setuptools>=69.0.0", "wheel>=0.43.0"]
and a pip install --upgrade pip
before everything, it works. Without that, I get strange build errors due do missing wheel
or packaging
.
pip install flash-attn==1.0.5
might help. I am using torch 1.13 and cuda 12.0.
after I do this, I get this error:
TypeError: MHA.__init__() got an unexpected keyword argument 'num_heads_kv'
We had torch in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. I'm not really an expert in Python packaging, so it's possible I'm doing sth wrong.
You are right that omitting in your requirements such a slow to install (compilation-requiring) and popular dependency like torch
is the best practice (judging from the wrappers on "classic" ML algos such as xgboost
or lightgbm
).
The problem here is that your installer tries to import torch
, which is a not a good idea, because it fails unless developers/maintainers are able to guarantee the expected installation sequence (first torch
, then flash-attn
), which really should not be expected from batch installation processes or in new environments. This assumption routinely fails in Dockerfile
s of GPU-enabled Docker containers, because there we install GPU-enabled packages such as torch
as the last ones precisely because some of their wrappers still (despite all the educational efforts :) contain CPU-only versions as their requirements and we need to exchange the wrong CPU-only torch
(tensorflow
, xgboost
, ...) for the correct GPU-enabled version, and we do it at the very end of the installation process, after all their wrappers and reverse dependencies have been already installed.
Make sure you're using the latest version of pip, wheel and setuptools. Then it's fine.
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip wheel setuptools
python -m pip install flash-attn
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, einops, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, flash-attn
Successfully installed MarkupSafe-2.1.5 einops-0.8.0 filelock-3.15.4 flash-attn-2.6.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.82 nvidia-nvtx-cu12-12.1.105 sympy-1.13.1 torch-2.3.1 triton-2.3.1 typing-extensions-4.12.2
Does anyone know how we can use the flash_attn current version in T4 GPU? I am using a model that uses flash attn and I am unable to load it in the T4 machine
anyone having problems with macbookpro m3?
Following sequence works:
python -m venv ~/myvenv
source ~/myvenv/bin/activate
pip install torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install wheel
pip install flash-attn --no-build-isolation
When I run pip install flash-attn, it says that. But obviously, it is wrong. See screenshot.