Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.39k stars 1.35k forks source link

page not found in setup.py #979

Open kishida opened 5 months ago

kishida commented 5 months ago

in setup.py, urllib.request.urlretrieve(wheel_url, wheel_filename) try to load wheel but the url starts with https://github.com/Dao-AILab/flash-attention/releases/download/ that is not found. This is causing other error. https://github.com/Dao-AILab/flash-attention/blob/main/setup.py#L271

tridao commented 5 months ago

Why is it not found?

kishida commented 5 months ago

I'm building on Windows with CUDA 11.8 or 12.1 in Developer Command Prompt for VS 2022. but sorry, to reproduce the error I tryed to install again on new venv with CUDA 11.8, but the installation succeeded although it seems not work well.
anywhere cache is left? loading Phi3 vision gets the error. (with 12.1, it works well)

  File "D:\dev\llm\cu118\Lib\site-packages\transformers\modeling_utils.py", line 1571, in _check_and_enable_flash_attn_2
    raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}")
ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.