ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
141 stars 46 forks source link

[Issue]: Installing ROCm Flash-Attention on RHEL #69

Open varshaprasad96 opened 4 months ago

varshaprasad96 commented 4 months ago

Problem Description

We are trying to install ROCm flash-attention on RHEL using steps similar to those mentioned in the Dockerfile, but using a RHEL/UBI9 base image (registry.access.redhat.com/ubi9:latest) instead of rocm/pytorch.

As a prerequisite, the Dockerfile installs setuptools, packaging, ninja, and torch from https://download.pytorch.org/whl/rocm6.0, as recommended on the PyTorch website and the README of the repository.

Here are the versions: Python: 3.11 Setuptools: 71.1.0 Torch: 2.1.1

The intention is to install the flash-attention successfully for ROCm version 6.1.2.

However, these are the following issues:

  1. Patching Hipify:

The hipify.py script has been modified in recent versions of Torch, causing the patch command to fail. The Dockerfile references this command: # https://github.com/ROCm/flash-attention/blob/2554f490101742ccdc56620a938f847f61754be6/Dockerfile.rocm#L27

It looks like the version of hipify.py in Torch 2.1.1 does not match the expected version for ROCm 6.1.2. Could you specify which version of Torch should be used with ROCm 6.1.2 to avoid this issue?

  1. Mismatch in Version Requirements and pip install Errors

When running pip install ., after cloning the repository and setting the PYTHON_SITE_PACKAGES path, the following errors appear:

Using pip 24.1.2 from /flash-attention/myenv/lib64/python3.9/site-packages/pip (python 3.9)
Processing /flash-attention
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/flash-attention/setup.py", line 293, in <module>
      build_for_cuda()
    File "/flash-attention/setup.py", line 125, in build_for_cuda
      raise_if_cuda_home_none("flash_attn")
    File "/flash-attention/setup.py", line 63, in raise_if_cuda_home_none
      raise RuntimeError(
  RuntimeError: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

Warning: Torch did not find available GPUs on this system.
If your intention is to cross-compile, this is not an error.
By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
Volta (compute capability 7.0), Turing (compute capability 7.5),
and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
If you wish to cross-compile for a single specific architecture,
export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.

torch.__version__  = 2.3.1+cu121

error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /flash-attention/myenv/bin/python -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
#   import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
#   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
#     manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize

try:
    import setuptools
except ImportError as error:
    print(
        "ERROR: Can not execute `setup.py` since setuptools is not available in "
        "the build environment.",
        file=sys.stderr,
    )
    sys.exit(1)

__file__ = %r
sys.argv[0] = __file__

if os.path.exists(__file__):
    filename = __file__
    with tokenize.open(__file__) as f:
        setup_py_code = f.read()
else:
    filename = "<auto-generated setuptools caller>"
    setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/flash-attention/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-4o0z0_8v
cwd: /flash-attention/
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

There are 2 issues here: 2.1 Error with setuptools not being available, even though it is present in the PYTHON_SITE_PACKAGES. 2.2 Error with nvcc not present.

For (2.1): Even after verifying the availability of setuptools in the expected location, setting the env var and also setting PYTHONPATH the error still persists. Is there any way to identify where the shim that pip uses in setup.py is looking at while installing. Also are there any specific version requirements that is being violated.

For (2.2): Tried setting FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE as suggested in this issue, assuming nvcc would not be required, but the problem still persists.

Tl;dr; these are the major issues we need help with:

  1. Which version of Torch should be used with ROCm 6.1.2 to ensure the right changes are made in hipify to avoid hipification of certain files?
  2. Is there a specific version of dependencies that need to be used to avoid the setup.py errors?
  3. How do we resolve the nvcc not found issue, especially when intending to use ROCm instead of CUDA?

It would be helpful if anyone could provide guidance or help in resolving these issues. Thank you!

Operating System

RHEL/UBI9

CPU

NA

GPU

AMD Instinct MI300X, AMD Instinct MI300A

ROCm Version

ROCm 6.1.0

sancspro commented 3 months ago

Were you able to find a solution or workaround for this issue? Facing the same error with torch.version = 2.4.0+rocm6.1

Used to install and make use of flash-attn sometime back with Navi32.

jiridanek commented 3 months ago

Install ROCm devel packages first (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-overview.html, dnf install rocm in our case) and only install flash-attention after that.

sancspro commented 3 months ago

Thanks for responding. I uninstalled current rocm package completely rebooted and reinstalled it. Went back from 6.2.0 to 6.1.0 Now, FA installs and works fine.

ppanchad-amd commented 2 weeks ago

Hi @varshaprasad96. Were you able to resolve your issue? If so, please close the ticket. Thanks!