ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
131 stars 41 forks source link

[Issue]: Installation failed through Dockerfile #42

Open amdrenwuli opened 7 months ago

amdrenwuli commented 7 months ago

Problem Description

  1. It will be failed if I build flash attention in Dockerfile The Dockerfile is as follows:
    
    ARG BASE_DOCKER=rocm/pytorch-nightly
    FROM $BASE_DOCKER

WORKDIR /workspace USER root

RUN pip install ninja RUN git clone -b flash_attention_for_rocm --recurse-submodules https://github.com/ROCmSoftwarePlatform/flash-attention.git RUN cd /workspace/flash-attention \ && python setup.py install RUN pip3 list

The logs are as below:
[docker_log.txt](https://github.com/ROCm/flash-attention/files/14162404/docker_log.txt)

2. It will be succeed if I build flash attention inside a running docker
```bash
$ docker pull rocm/pytorch-nightly
$ docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 8G rocm/pytorch-nightly:latest
$ apt show rocm-libs -a
Package: rocm-libs
Version: 5.7.0.50700-63~20.04
Priority: optional
Section: devel
Maintainer: ROCm Libs Support <rocm-libs.support@amd.com>
Installed-Size: 13.3 kB
Depends: hipblas (= 1.1.0.50700-63~20.04), hipblaslt (= 0.3.0.50700-63~20.04), hipfft (= 1.0.12.50700-63~20.04), hipsolver (= 1.8.1.50700-63~20.04), hipsparse (= 2.3.8.50700-63~20.04), miopen-hip (= 2.20.0.50700-63~20.04), rccl (= 2.17.1.50700-63~20.04), rocalution (= 2.1.11.50700-63~20.04), rocblas (= 3.1.0.50700-63~20.04), rocfft (= 1.0.23.50700-63~20.04), rocrand (= 2.10.17.50700-63~20.04), rocsolver (= 3.23.0.50700-63~20.04), rocsparse (= 2.5.4.50700-63~20.04), rocm-core (= 5.7.0.50700-63~20.04), hipblas-dev (= 1.1.0.50700-63~20.04), hipblaslt-dev (= 0.3.0.50700-63~20.04), hipcub-dev (= 2.13.1.50700-63~20.04), hipfft-dev (= 1.0.12.50700-63~20.04), hipsolver-dev (= 1.8.1.50700-63~20.04), hipsparse-dev (= 2.3.8.50700-63~20.04), miopen-hip-dev (= 2.20.0.50700-63~20.04), rccl-dev (= 2.17.1.50700-63~20.04), rocalution-dev (= 2.1.11.50700-63~20.04), rocblas-dev (= 3.1.0.50700-63~20.04), rocfft-dev (= 1.0.23.50700-63~20.04), rocprim-dev (= 2.13.1.50700-63~20.04), rocrand-dev (= 2.10.17.50700-63~20.04), rocsolver-dev (= 3.23.0.50700-63~20.04), rocsparse-dev (= 2.5.4.50700-63~20.04), rocthrust-dev (= 2.18.0.50700-63~20.04), rocwmma-dev (= 1.2.0.50700-63~20.04)
Homepage: https://github.com/RadeonOpenCompute/ROCm
Download-Size: 1014 B
APT-Manual-Installed: yes
APT-Sources: http://repo.radeon.com/rocm/apt/5.7 focal/main amd64 Packages
Description: Radeon Open Compute (ROCm) Runtime software stack
$ which python
/opt/conda/envs/py_3.8/bin/python
$ mkdir -p /workspace && cd /workspace
$ git clone -b flash_attention_for_rocm --recurse-submodules https://github.com/ROCmSoftwarePlatform/flash-attention.git
$ cd /workspace/flash-attention && python setup.py install

......

Searching for packaging==23.2
Best match: packaging 23.2
Adding packaging 23.2 to easy-install.pth file
detected new path './einops-0.7.0-py3.8.egg'

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for torch==2.3.0a0+gitac0bed0
Best match: torch 2.3.0a0+gitac0bed0
Adding torch 2.3.0a0+gitac0bed0 to easy-install.pth file
Installing convert-caffe2-to-onnx script to /opt/conda/envs/py_3.8/bin
Installing convert-onnx-to-caffe2 script to /opt/conda/envs/py_3.8/bin
Installing torchrun script to /opt/conda/envs/py_3.8/bin

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for fsspec==2023.4.0
Best match: fsspec 2023.4.0
Adding fsspec 2023.4.0 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for Jinja2==3.1.2
Best match: Jinja2 3.1.2
Adding Jinja2 3.1.2 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for networkx==2.8.8
Best match: networkx 2.8.8
Adding networkx 2.8.8 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for sympy==1.12
Best match: sympy 1.12
Adding sympy 1.12 to easy-install.pth file
Installing isympy script to /opt/conda/envs/py_3.8/bin

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for typing-extensions==4.9.0
Best match: typing-extensions 4.9.0
Adding typing-extensions 4.9.0 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for filelock==3.9.0
Best match: filelock 3.9.0
Adding filelock 3.9.0 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for MarkupSafe==2.1.3
Best match: MarkupSafe 2.1.3
Adding MarkupSafe 2.1.3 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Searching for mpmath==1.3.0
Best match: mpmath 1.3.0
Adding mpmath 1.3.0 to easy-install.pth file

Using /opt/conda/envs/py_3.8/lib/python3.8/site-packages
Finished processing dependencies for flash-attn==2.0.4

Operating System

20.04.5 LTS (Focal Fossa)

CPU

AMD EPYC 73F3 16-Core Processor

GPU

AMD Instinct MI250X

ROCm Version

ROCm 5.7.0

ROCm Component

No response

Steps to Reproduce

Shown in Problem Description

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

rocminfo.txt

Additional Information

No response

jayz0123 commented 7 months ago

@amdrenwuli Please see this pr. In order to build FA in the dockerfile, you need to manually set --offload-arch to the type of GPUs you have. This is because the environment in the dockerfile has no access to your GPUs.

amdrenwuli commented 7 months ago

@amdrenwuli Please see this pr. In order to build FA in the dockerfile, you need to manually set --offload-arch to the type of GPUs you have. This is because the environment in the dockerfile has no access to your GPUs.

Thanks for your reply. Another question, do we not support installing csrc/rotary and csrc/layer_norm currently?

root@xcdlossgpu08:/workspace/flash-attention# pip install csrc/layer_norm/
Processing ./csrc/layer_norm
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [13 lines of output]
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/workspace/flash-attention/csrc/layer_norm/setup.py", line 99, in <module>
          raise_if_cuda_home_none("--fast_layer_norm")
        File "/workspace/flash-attention/csrc/layer_norm/setup.py", line 46, in raise_if_cuda_home_none
          raise RuntimeError(
      RuntimeError: --fast_layer_norm was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

      torch.__version__  = 2.3.0a0+gitac0bed0

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
jayz0123 commented 7 months ago

No. I think the CK team might have some plans. But for now I can confirm we do not have those features.

sabreshao commented 7 months ago

@amdrenwuli Support for other modules (rotary, xentropy, layer_norm) We will track effort in that issue.