flash attention is broken for cuda-12.x version

Bhagyashreet20 commented 5 months ago

Despite using the nvidia-containers with cuda 12.4 and compiling from source, i still run into below error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory

Flash attention should consider upgrading to latest container stack without explicit dependency on particular cuda runtime version. Such dependencies are fragile and often breaks the pipeline once someone tries to upgrade.

Fixing such as reported in https://github.com/Dao-AILab/flash-attention/issues/208 or https://github.com/Dao-AILab/flash-attention/issues/728 are not correct solutions especially when compilation from source fails.

wplf commented 4 months ago

I thought it's your cuda lib is not included. Try export LD_LIBRARY_PATH=/usr/lib/cuda/lib:$LD_LIBRARY_PATH or something like this.

DrChiZhang commented 3 months ago

Despite using the nvidia-containers with cuda 12.4 and compiling from source, i still run into below error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File "/home/scratch.btaleka_gpu_1/code/flash-attention/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: libcudart.so.11.0: cannot open shared object file: No such file or directory
Flash attention should consider upgrading to latest container stack without explicit dependency on particular cuda runtime version. Such dependencies are fragile and often breaks the pipeline once someone tries to upgrade.

Fixing such as reported in #208 or #728 are not correct solutions especially when compilation from source fails.

Hi, did you find the solution?

tomcat123a commented 1 month ago

https://github.com/Dao-AILab/flash-attention/releases check for cuda12.3 release

Dao-AILab / flash-attention

flash attention is broken for cuda-12.x version #1004