ROCm / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
131 stars 41 forks source link

Remove offload-arch=native in the build #18

Closed fxmarty closed 10 months ago

fxmarty commented 11 months ago

Hi,

Hard-coding --offload-arch=native make the build of RoCm flash attention fail in docker build (as I guess GPUs are not accessible during build)

Moreover, this prevents setup.py to obey to the variable PYTORCH_ROCM_ARCH, which is a quite useful feature.

fxmarty commented 11 months ago

cc @sabreshao @howiejayz @fsx950223 what do you think?

Unrelated - I was wondering if you were open to allow issues in this repo? I encountered a few that I think could be nice to report (at least for other users using this repo).

sabreshao commented 10 months ago

@fxmarty we plan to add an option to resolve docker build. @howiejayz will do that.

jayz0123 commented 10 months ago

Hi @fxmarty, can you close this PR and move any of your request to issues? I will go through them including this one.

fxmarty commented 10 months ago

Hi @howiejayz happy to do so - however I can't see an issues tab in the repo:

image

jayz0123 commented 10 months ago

Hi @howiejayz happy to do so - however I can't see an issues tab in the repo:

image

Hi @fxmarty, could you try the latest build_and_run.sh script for building flash-attention in Dockerfile. Also the Issue section is finally opened.

fxmarty commented 10 months ago

Hi @howiejayz, thank you I indeed noticed the available GPU_ARCHS variable, that's working fine now.

Thank you for opening the issue section!