Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.16k stars 1.32k forks source link

No Module Named 'torch' #246

Open MilesQLi opened 1 year ago

MilesQLi commented 1 year ago

When I run pip install flash-attn, it says that. But obviously, it is wrong. See screenshot. image

ulysses500 commented 1 year ago

Same issue for me.

ulysses500 commented 1 year ago

Workaround: install the previous version pip install flash_attn==1.0.5

smeyerhot commented 1 year ago

I am seeing the same problem on every flash_attn version. I am using Cuda 12.1 on the new g2 vm instance from gcp. https://cloud.google.com/compute/docs/accelerator-optimized-machines#g2-vms. The underlying GPU is the Nvidia L4 which uses Ada.

smeyerhot commented 1 year ago

Workaround: install the previous version pip install flash_attn==1.0.5

This might work in some scenarios but not all.

tridao commented 1 year ago

Can you try python -m pip install flash-attn? It's possible that pip and python -m pip refer to different environments.

Getting the dependencies right for all setup is hard. We had torch in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. So for 1.0.6 we leave torch out of the dependency.

official-elinas commented 1 year ago

Getting the same issue. I also tried python -m pip install flash-attn as you suggested with the same failure.

nrailg commented 1 year ago

same problem here.

tridao commented 1 year ago

I don't know a right solution that works for all setups, happy to hear suggestions.

We recommend the Pytorch container from Nvidia, which has all the required tools to install FlashAttention.

smeyerhot commented 1 year ago

I believe this is an incompatibility issue with cuda 12.1 version of torch.

Using the following torch version solves my probem.

torch==2.0.0+cu117

MilesQLi commented 1 year ago

@smeyerhot I use the exact version, but it doesn't work. See the screenshot.

image

smeyerhot commented 1 year ago

@MilesQLi

I believe this is an incompatibility issue with cuda 12.1 version of torch.

Using the following torch version solves my probem.

torch==2.0.0+cu117

Sorry! This didn't fix things... apologies on the false hope.

MilesQLi commented 1 year ago

@smeyerhot No problem. Thanks a lot anyway!

jzsbioinfo commented 1 year ago

same problem

quant-cracker commented 1 year ago

same problem

leucocyte123 commented 1 year ago

pip install flash-attn==1.0.5 might help. I am using torch 1.13 and cuda 12.0.

Maykeye commented 1 year ago

I had the same issue with pip. Workaround was to compile from source, worked as a charm

In [1]: import flash_attn

In [2]: import torch

In [3]: torch.__version__
Out[3]: '2.0.1+cu117'

In [4]: flash_attn.__version__
Out[4]: '1.0.6'
Evan-aja commented 1 year ago

I had the same issue with pip. Workaround was to compile from source, worked as a charm

In [1]: import flash_attn

In [2]: import torch

In [3]: torch.__version__
Out[3]: '2.0.1+cu117'

In [4]: flash_attn.__version__
Out[4]: '1.0.6'

I also had the same issue, but my system needs Cuda 12.1 (2x Nvidia L4). so using torch 117 is not an option.

this is also my workaround and it works like a charm.

my system uses Fedora Server

official-elinas commented 1 year ago

I compiled it myself using a docker container and I still get this when executing RuntimeError: Expected q_dtype == torch::kFloat16 || ((is_sm8x || is_sm90) && q_dtype == torch::kBFloat16) to be true

xwyzsn commented 1 year ago

try pip install flash-attn --no-build-isolation fixed my problem.
pip docs to fix this problem, maybe adding torch dependency into pyproject.toml can help

official-elinas commented 1 year ago

@xwyzsn Unfortunately this only worked on my windows system, not linux. But I feel we're making progress.

tridao commented 1 year ago

to fix this problem, maybe adding torch dependency into pyproject.toml can help

We had torch in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. I'm not really an expert in Python packaging, so it's possible I'm doing sth wrong.

xwyzsn commented 1 year ago

@xwyzsn Unfortunately this only worked on my windows system, not linux. But I feel we're making progress.

Hi, actually I am using linux. It also worked well. I assume that you may missed some other package to build this up in your linux system.

--no-build-isolation ...Build dependencies specified by PEP 518 must be already installed if this option is used.

bansky-cl commented 1 year ago

same problem to me, i solve by check my device and torch cuda version.

Wraken commented 1 year ago

try pip install flash-attn --no-build-isolation fixed my problem. pip docs to fix this problem, maybe adding torch dependency into pyproject.toml can help

This fixed the torch problem, but now I got an other error. Might be related to something else tho.

        435 |         function(_Functor&& __f)
            |                                                                                                                                                 ^
      /usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
      /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
        530 |         operator=(_Functor&& __f)
            |                                                                                                                                                  ^
      /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
      error: command '/usr/bin/nvcc' failed with exit code 255
vchiley commented 1 year ago
Screenshot 2023-06-02 at 3 50 03 PM

@xwyzsn ninja was removed, then torch was removed, then ninja was re-added. Next logical step is to re-add torch. right??? 😄

CallShaul commented 1 year ago

Same issue in Kubuntu 20 with torch 2.0.1 and cuda 11.8 python 3.9 / python 3.10 and flash-attn versions 0.2.8 / 1.0.4 / 1.0.5 / 1.0.6 / 1.0.7 with and without --no-build-isolation flag.

BaohaoLiao commented 1 year ago

Thanks to the previous answers, I can install it successfully. Here is my experience: Environment torchv2.0.0 + cuda11.7 on Ubuntu

  1. I meet error as ModuleNotFoundError: No module named 'torch', then I install as pip install flash-attn --no-build-isolation
  2. It raises another error as ModuleNotFoundError: No module named 'packaging', then I install this package as pip install packaging
  3. re-run the installation, another error comes RuntimeError: The current installed version of g++ (4.8.5) is less than the minimum required version by CUDA 11.7 (6.0.0). Please make sure to use an adequate version of g++ (>=6.0.0, <12.0).
  4. I use a higher version of g++9.0, and it finally works
Talkvibes commented 1 year ago

Workaround: install the previous version pip install flash_attn==1.0.5

image

how do tackle this

smeyerhot commented 1 year ago

I had the same issue with pip. Workaround was to compile from source, worked as a charm


In [1]: import flash_attn

In [2]: import torch

In [3]: torch.__version__

Out[3]: '2.0.1+cu117'

In [4]: flash_attn.__version__

Out[4]: '1.0.6'

I also had the same issue, but my system needs Cuda 12.1 (2x Nvidia L4). so using torch 117 is not an option.

this is also my workaround and it works like a charm.

my system uses Fedora Server

What was your solution for cuda 12 and L4 gpu?

Edresson commented 1 year ago

I got the same issue here. I only was able to build from the source (clone the repo then run python setup.py install). pip install git+https://github.com/HazyResearch/flash-attention also give me the same error. I'm using torch==1.12.1+cu113.

Martion-z commented 1 year ago

try pip install flash-attn --no-build-isolation fixed my problem. pip docs to fix this problem, maybe adding torch dependency into pyproject.toml can help

This fixed the torch problem, but now I got an other error. Might be related to something else tho.

        435 |         function(_Functor&& __f)
            |                                                                                                                                                 ^
      /usr/include/c++/11/bits/std_function.h:435:145: note:         ‘_ArgTypes’
      /usr/include/c++/11/bits/std_function.h:530:146: error: parameter packs not expanded with ‘...’:
        530 |         operator=(_Functor&& __f)
            |                                                                                                                                                  ^
      /usr/include/c++/11/bits/std_function.h:530:146: note:         ‘_ArgTypes’
      error: command '/usr/bin/nvcc' failed with exit code 255

I had the same issue. Could you solve this?

tridao commented 1 year ago

@Martion-z looks the same as #172 and #225. Some version of CUDA doesn't like gcc 11. Downgrading to gcc 10 might work.

shahules786 commented 1 year ago
conda install -c conda-forge cudatoolkit-dev
pip flash_attn==1.0.5

This woked for me.

Crysflair commented 1 year ago

pip install flash-attn==1.0.5

Thanks! This solves the above error. But there's still a new one occurs: The detected CUDA version (12.1) mismatches the version that was used to compile.

Crysflair commented 1 year ago

pip install flash-attn==1.0.5

Thanks! This solves the above error. But there's still a new one occurs: The detected CUDA version (12.1) mismatches the version that was used to compile.

Problem solved. I installed the nightly version of pytorch, and then install flash-attn with the no-build-isolation option.

pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu121
# Successfully installed torch-2.1.0.dev20230710+cu121
pip install flash-attn --no-build-isolation
# Successfully built flash-attn
# Installing collected packages: ninja, flash-attn
# Successfully installed flash-attn-1.0.8 ninja-1.11.1

Note the wheel building process takes a long time. Don't kill it and just wait.

Nan-Do commented 1 year ago

There are a lot of things that can go wrong when installing this package. Next I'm going to share a recipe that should work right now using conda.

Some remarks:

Create a new environment if you don't have already one.

conda create -n flash_attn python=3.10.11
conda activate flash_attn

These are the required packages with their required versions.

conda install -c conda-forge gcc=11.3
conda install -c conda-forge gxx=11.3
conda install cuda -c nvidia/label/cuda-11.8.0
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install packaging
pip install flash_attn --no-build-isolation
jinghan23 commented 1 year ago

There are a lot of things that can go wrong when installing this package. Next I'm going to share a recipe that should work right now using conda.

Some remarks:

  • Don't use pip to install any cuda or pytorch libraries (this include any package that might reference them before having them installed via conda)
  • Make sure you don't have several versions of CUDA installed, if that is your case the installation might fail.
  • Try to install ninja from your distribution's package manager. The process might work without having it installed but it does a much better job detecting the environment.
  • This work with the current torch version and cuda 11.8 no warranties that it will keep working for future versions.
  • If you couldn't install it before I strongly recommend to start from a fresh conda environment.

Create a new environment if you don't have already one.

conda create -n flash_attn python=3.10.11
conda activate flash_attn

These are the required packages with their required versions.

conda install -c conda-forge gcc=11.3
conda install -c conda-forge gxx=11.3
conda install cuda -c nvidia/label/cuda-11.8.0
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
pip install packaging
pip install flash_attn --no-build-isolation

Thanks for your kind and detailed recipe! But I still meet this problem...

Collecting flash_attn                                                                                                                                                
  Using cached flash_attn-1.0.9.tar.gz (1.8 MB)                                                                                                                      
  Preparing metadata (setup.py) ... error                                                                                                                            
  error: subprocess-exited-with-error                                                                                                                                

  × python setup.py egg_info did not run successfully.                                                                                                               
  │ exit code: 1                                                                                                                                                     
  ╰─> [21 lines of output]                                                                                                                                           
      Traceback (most recent call last):                                                                                                                             
        File "<string>", line 2, in <module>                                                                                                                         
        File "<pip-setuptools-caller>", line 34, in <module>                                                                                                         
        File "/tmp/pip-install-57_5gf64/flash-attn_939101cdc95c431a947f582f325cfb21/setup.py", line 111, in <module>                                                 
          _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)                                                                                             
        File "/tmp/pip-install-57_5gf64/flash-attn_939101cdc95c431a947f582f325cfb21/setup.py", line 26, in get_cuda_bare_metal_version                               
          raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)                                                              
        File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 421, in check_output                                                       
          return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,                                                                                           
        File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 503, in run                                                                
          with Popen(*popenargs, **kwargs) as process:                                                                                                               
        File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 971, in __init__                                                           
          self._execute_child(args, executable, preexec_fn, close_fds,                                                                                               
        File "/home/adseadmin/miniconda3/envs/lmtest0/lib/python3.10/subprocess.py", line 1863, in _execute_child                                                    
          raise child_exception_type(errno_num, err_msg, err_filename)                                                                                               
      FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda-11.5/bin/nvcc'                                                                        

      torch.__version__  = 2.0.1                                                                                                                                     

      [end of output]                                                                                                                                                

  note: This error originates from a subprocess, and is likely not a problem with pip.                                                                               
error: metadata-generation-failed                                                                                                                                    

× Encountered error while generating package metadata.                                                                                                               
╰─> See above for output.                                                                                                                                            

note: This is an issue with the package mentioned above, not pip.                                                                                                    
hint: See above for details. 
Nan-Do commented 1 year ago

@jinghan23 See point two, you have several versions of CUDA installed and the installation is failing because of that.

Wraken commented 1 year ago

The solution I've found working :

karandua2016 commented 1 year ago

The workaround that worked for me was to downgrade the cuda runtime version. My driver version is still 12.2 but the runtime version is now 11.7. It is also much faster than installing with no build isolation.

conda install -c "nvidia/label/cuda-11.7.0" cuda-toolkit
pip3 install flash-attn==1.0.5
HadiAskari commented 9 months ago

flash-attn is a very problematic library

xylcbd commented 8 months ago

you MUST install flash-atten after torch been installed

fakerybakery commented 7 months ago

Solution: Install PyTorch first, then install FA2.

Example:

WRONG:

$ pip install torch flash-attn

CORRECT:

$ pip install torch
$ pip install flash-attn

(source)

phoerious commented 6 months ago

I've tried adding torch also as a build and not just a runtime dependency in my pyproject.toml. It doubled the install time, but actually ended up not working.

My workaround now is to have it as an optional dependency:

[project.optional-dependencies]
# Flash attention cannot be installed alongside normal dependencies,
# since it requires torch during build time. Install with
#     pip install '.[flash-attn]'
# after installing everything else first.
flash-attn = [
    "flash-attn>=2.5.7"
]

And then do two pip installs. One without [flash-attn] and then one with [flash-attn].

It's also worth noting that flash-attn is extremely picky when it comes to the pip and wheel versions. With the following build requirements:

[build-system]
requires = ["setuptools>=69.0.0", "wheel>=0.43.0"]

and a pip install --upgrade pip before everything, it works. Without that, I get strange build errors due do missing wheel or packaging.

cclough commented 6 months ago

pip install flash-attn==1.0.5 might help. I am using torch 1.13 and cuda 12.0.

after I do this, I get this error:

TypeError: MHA.__init__() got an unexpected keyword argument 'num_heads_kv'

mirekphd commented 5 months ago

We had torch in the dependency in 1.0.5, but for some users it would download a new version of torch instead of using the existing one. I'm not really an expert in Python packaging, so it's possible I'm doing sth wrong.

You are right that omitting in your requirements such a slow to install (compilation-requiring) and popular dependency like torch is the best practice (judging from the wrappers on "classic" ML algos such as xgboost or lightgbm).

The problem here is that your installer tries to import torch, which is a not a good idea, because it fails unless developers/maintainers are able to guarantee the expected installation sequence (first torch, then flash-attn), which really should not be expected from batch installation processes or in new environments. This assumption routinely fails in Dockerfiles of GPU-enabled Docker containers, because there we install GPU-enabled packages such as torch as the last ones precisely because some of their wrappers still (despite all the educational efforts :) contain CPU-only versions as their requirements and we need to exchange the wrong CPU-only torch (tensorflow, xgboost, ...) for the correct GPU-enabled version, and we do it at the very end of the installation process, after all their wrappers and reverse dependencies have been already installed.

amirsoofi commented 3 months ago

Make sure you're using the latest version of pip, wheel and setuptools. Then it's fine.

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip wheel setuptools
python -m pip install flash-attn
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, networkx, MarkupSafe, fsspec, filelock, einops, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch, flash-attn
Successfully installed MarkupSafe-2.1.5 einops-0.8.0 filelock-3.15.4 flash-attn-2.6.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.5.82 nvidia-nvtx-cu12-12.1.105 sympy-1.13.1 torch-2.3.1 triton-2.3.1 typing-extensions-4.12.2
atnafuatx commented 3 months ago

Does anyone know how we can use the flash_attn current version in T4 GPU? I am using a model that uses flash attn and I am unable to load it in the T4 machine

ManuelSokolov commented 3 months ago

anyone having problems with macbookpro m3?

riturajj-cerebras commented 2 months ago

Following sequence works:

python -m venv ~/myvenv
source ~/myvenv/bin/activate
pip install torch==2.3.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip install wheel
pip install flash-attn --no-build-isolation