Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
11.79k stars 1.04k forks source link

flash-attn successfully installed but Flash Attention 2 is not available #957

Open YanxinLu opened 1 month ago

YanxinLu commented 1 month ago

cuda: 11.7 torch: 2.0.1 python: 3.10.9 release: flash_attn-2.3.5+cu117torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

File "/home/.conda/envs/venv310/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 566, in from_pretrained 
    return model_class.from_pretrained(
  File "/home/.conda/envs/venv310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3233, in from_pretrained
    config = cls._check_and_enable_flash_attn_2(config, torch_dtype=torch_dtype, device_map=device_map)
  File "/home/.conda/envs/venv310/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1273, in _check_and_enable_flash_attn_2
    raise ImportError(
ImportError: Flash Attention 2 is not available. Please refer to the documentation of https://github.com/Dao-AILab/flash-attention for installing it. Make sure to have at least the version 2.1.0

I failed in installing flash-attn by pip install flash-attn --no-build-isolation. raise OsError('CUDA_HOME environment variable is not set. though I set CUDA_HOME and the variable can be seen by echo.

I've successfully installed flash-attn through release.

pip show flash-attn
Name: flash-attn
Version: 2.3.5
Summary: Flash Attention: Fast and Memory-Efficient Exact Attention
Home-page: https://github.com/Dao-AILab/flash-attention
Author: Tri Dao
Author-email: trid@cs.stanford.edu
License:
Location: /scratch/users/yanxinl4/my.conda.dir/envs/venv310/lib/python3.10/site-packages
Requires: einops, ninja, packaging, torch
Required-by:
luisegehaijing commented 2 weeks ago

Hi, have you solved the problem yet? I successfully installed flash-atten through pip install flash-attn --no-build-isolation but I also got the "not available" error.