Failed building wheel for flash-attn

girmay commented 4 months ago

Is there a way around this? I would hate to downgrade CUDA as I have other stuff relying on V 12.3.

raise RuntimeError(CUDA_MISMATCH_MESSAGE.format(cuda_str_version, torch.version.cuda)) RuntimeError: The detected CUDA version (12.3) mismatches the version that was used to compile PyTorch (11.7). Please make sure to use the same CUDA versions.

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects

interstellarninja commented 3 months ago

hi could you try installing flash-attn with the following command:

MAX_JOBS=4 pip install flash-attn --no-build-isolation

OB-SPrince commented 3 months ago

seems this relates to flash-attn==2.5.5 in the requirements.txt.

Can you see if changing this to flash-attn>=2.5.5 and then running pip install --upgrade -r requirements.txt?

HRuii1 commented 3 months ago

Hi when I tried the above methods I got this:

(venv) Hermes-Function-Calling % MAX_JOBS=4 pip install flash-attn --no-build-isolation
Collecting flash-attn
  Using cached flash_attn-2.5.6.tar.gz (2.5 MB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [20 lines of output]
      fatal: not a git repository (or any of the parent directories): .git
      /private/var/folders/0f/fc7gq2sd0kn_8j_v2qppmxt40000gn/T/pip-install-ycud5ccl/flash-attn_49ae59720caf4f6f9c41bfdf10cc706a/setup.py:78: UserWarning: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.
        warnings.warn(
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/0f/fc7gq2sd0kn_8j_v2qppmxt40000gn/T/pip-install-ycud5ccl/flash-attn_49ae59720caf4f6f9c41bfdf10cc706a/setup.py", line 133, in <module>
          CUDAExtension(
        File "/Users/lhr/Desktop/Project/Hermes-Function-Calling/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1074, in CUDAExtension
          library_dirs += library_paths(cuda=True)
        File "/Users/lhr/Desktop/Project/Hermes-Function-Calling/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1201, in library_paths
          if (not os.path.exists(_join_cuda_home(lib_dir)) and
        File "/Users/lhr/Desktop/Project/Hermes-Function-Calling/venv/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 2407, in _join_cuda_home
          raise OSError('CUDA_HOME environment variable is not set. '
      OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

      torch.__version__  = 2.2.1

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I'm using MacOS but I think cuda is not avaliable on mac. Is there any way to solve this? Thank you in advance!

suparious commented 3 months ago

I'm using MacOS but I think cuda is not avaliable on mac.

Yeah, this app requires CUDA to run, and for the CUDA_HOME environment variable to be set where you have CUDA installed.

I think for this to work on a MacBook, the app would need to support TensorFlow instead of CUDA directly.

balavenkatesh-ai commented 3 months ago

@suparious I'm facing same issues in google colab. Please give me fix for this.

suparious commented 3 months ago

If your CUDA_HOME environment variable is not set, then set it yourself with an export CUDA_HOME=<path_to_where_cuda_installed>

https://stackoverflow.com - cuda-home-path-for-tensorflow

If using Debian/Ubuntu or Arch linux, you can install packages like sudo apt install python3-wheel or sudo pacman -Syu python-wheel (for Arch linux) to have your system install any missing dependencies for wheel. Then deactivate and delete, then re-create your python virtual environment. Install wheel and setuptools with: pip install setuptools wheel in your python virtual environment.

It seems that this project specifically wants pytorch 2.1.2, so you can manually satisfy the dependency, by using the precompiled wheels version from here: https://download.pytorch.org/whl/torch/

For example, I would chose this one for Python 3.11 on a 64bit version of linux: torch-2.1.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64 Or this one for a newer MacBook with Apple silicon: torch-2.1.2-cp311-none-macosx_11_0_arm64.whl

The flash attention wheels seems to just need a installed version of CUDA that is > 11.6. (System requirements is CUDA-capable GPU)

The README here: https://github.com/Dao-AILab/flash-attention suggests that you can compile and install flash-attn without build isolation.

MAX_JOBS=4 pip install flash-attn --no-build-isolation

suparious commented 3 months ago

On a fresh install of Debian 12 (prob same as ubuntu 22.04), after getting NVIDIA GPU drivers and cuda installed (Can check with nvidia-smi command), I installed the following packages:

sudo apt install python3-full python3-venv python3-pip python3-wheel python-is-python3

Then I created a fresh virtual environment, activated it and installed the requirements:

python -m venv ~/venv-hermes-fc
source ~/venv-hermes-fc/bin/activate
git clone git@github.com:NousResearch/Hermes-Function-Calling.git
cd Hermes-Function-Calling/
pip install setuptools wheel packaging torch==2.1.2
pip install --upgrade -r requirements.txt

then I was able to use: python jsonmode.py --query "Please return a json object to represent Goku from the anime Dragon Ball Z?" just like the README is showing.

{
  "name": "Goku",
  "species": "Saiyan",
  "role": "Main Protagonist",
  "personality_traits": [
    "Determined",
    "Kind-hearted",
    "Fierce"
  ],
  "special_attacks": [
    "Kamehameha",
    "Instant Transmission",
    "Super Saiyan"
  ]
}

balavenkatesh-ai commented 3 months ago

@suparious Thanks for you instant reply.

I followed the steps you provided in Google Colab, but unfortunately, I encountered the same issue again. Below is the error message I received. I have also attached the Colab notebook for your reference. Could you please review it and provide a solution to resolve this issue so that it can run successfully in Google Colab? LLM.zip

return FlashAttnFunc.apply( File "/usr/local/lib/python3.10/dist-packages/torch/autograd/function.py", line 539, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 507, in forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = _flash_attn_forward( File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 51, in _flash_attn_forward out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state = flash_attn_cuda.fwd( RuntimeError: FlashAttention only supports Ampere GPUs or newer.

suparious commented 3 months ago

It shows clearly in the above error FlashAttention only supports Ampere GPUs or newer likely caused by using a T4 GPU on google colab. The T4 GPU series is a 8 year old+ architecture (Sep 13th, 2016 release), and cannot support flash attention.

RhizoNymph commented 2 months ago

this was fixed for me by requiring flash_attn 2.5.8

NousResearch / Hermes-Function-Calling

Failed building wheel for flash-attn #7