Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.37k stars 1.22k forks source link

flash-attn NVIDIA CUDA `nvcc` error on HuggingFace spaces #1093

Open kevalshah90 opened 1 month ago

kevalshah90 commented 1 month ago

I am on hugging face spaces and attempting to use vLLM for running benchmarks. 1

I installed vLLM and when I attempt to run the mixEval benchmarks from a local SFT model, it prompts me to install flash-attn. When I run the command pip install flash-attn --no-build-isolation, it throws the following error:

Collecting flash-attn
  Downloading flash_attn-2.6.2.tar.gz (2.6 MB)
     |████████████████████████████████| 2.6 MB 4.6 MB/s eta 0:00:01
    ERROR: Command errored out with exit status 1:
     command: /home/user/miniconda/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-yfk_smr1/flash-attn_113dd77c04f04b62ae12b599f273ee24/setup.py'"'"'; __file__='"'"'/tmp/pip-install-yfk_smr1/flash-attn_113dd77c04f04b62ae12b599f273ee24/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-zt95x311
         cwd: /tmp/pip-install-yfk_smr1/flash-attn_113dd77c04f04b62ae12b599f273ee24/
    Complete output (21 lines):
    fatal: not a git repository (or any of the parent directories): .git

    torch.__version__  = 2.3.1+cu121

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-yfk_smr1/flash-attn_113dd77c04f04b62ae12b599f273ee24/setup.py", line 158, in <module>
        _, bare_metal_version = get_cuda_bare_metal_version(CUDA_HOME)
      File "/tmp/pip-install-yfk_smr1/flash-attn_113dd77c04f04b62ae12b599f273ee24/setup.py", line 82, in get_cuda_bare_metal_version
        raw_output = subprocess.check_output([cuda_dir + "/bin/nvcc", "-V"], universal_newlines=True)
      File "/home/user/miniconda/lib/python3.9/subprocess.py", line 424, in check_output
        return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
      File "/home/user/miniconda/lib/python3.9/subprocess.py", line 505, in run
        with Popen(*popenargs, **kwargs) as process:
      File "/home/user/miniconda/lib/python3.9/subprocess.py", line 951, in __init__
        self._execute_child(args, executable, preexec_fn, close_fds,
      File "/home/user/miniconda/lib/python3.9/subprocess.py", line 1821, in _execute_child
        raise child_exception_type(errno_num, err_msg, err_filename)
    FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/b0/b2/09c651d7980a68f1dd55d6180d99b0605957911cca0c305f10b3fb72a36b/flash_attn-2.6.2.tar.gz#sha256=3fd311fe7321bb4676a3fab1da72564f7d4714cee5c8ef4b7873a8905a65cb72 (from https://pypi.org/simple/flash-attn/) (requires-python:>=3.8). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

[1] https://github.com/philschmid/MixEval/blob/main/README.md

coldn00dles commented 1 month ago

+1. it seems that the aliases for the model names havent been given for new versions of cuda

kevalshah90 commented 1 month ago

@tridao any suggestions here?

tridao commented 1 month ago

You need nvcc.