Is flash attention really used?

ptits commented 4 days ago

It causes error when installed from requirements.txt

ved-genmo commented 4 days ago

Can you try uv pip install -e . --no-build-isolation?

cnielsen79 commented 3 days ago

Mine gets stuck at the same using uv pip install -e . --no-build-isolation

ved-genmo commented 3 days ago

@cnielsen79 It probably takes a while to build flash attention. Possibly a half-hour or more. Luckily, it should be a one time cost.

ptits commented 3 days ago

Can you try uv pip install -e . --no-build-isolation?

Collecting flash-attn>=2.6.3 Using cached flash_attn-2.6.3.tar.gz (2.6 MB) Preparing metadata (setup.py) ... error error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully. │ exit code: 1 ╰─> [12 lines of output] fatal: not a git repository (or any of the parent directories): .git Traceback (most recent call last): File "", line 2, in File "", line 34, in File "/tmp/pip-install-d8hsy1z4/flash-attn_7c0a9633585e4accace338e6cf252983/setup.py", line 160, in raise RuntimeError( RuntimeError: FlashAttention is only supported on CUDA 11.6 and above. Note: make sure nvcc has a supported version by running nvcc -V.

  torch.__version__  = 2.4.1+cu121

  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: metadata-generation-failed

× Encountered error while generating package metadata. ╰─> See above for output.

I have NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0

dongyun-kim-arch commented 3 days ago

Have the same error with CUDA version... it would be great if docker environment is included in the repo...

cnielsen79 commented 1 day ago

@cnielsen79 It probably takes a while to build flash attention. Possibly a half-hour or more. Luckily, it should be a one time cost.

it never worked and gave up on it - seems like quite a few have this issue

genmoai / models

Is flash attention really used? #10