Closed Kingmeng-Stack closed 2 weeks ago
The main branch of this repo only works on CDNA 2/3 GPUs.
If you are looking for a memory-efficient attention implementation, you can already use SDPA in the latest PyTorch (experimental), which is powered by https://github.com/ROCm/AOTriton.
If you want an old Flash Attention implementation that works on RDNA 3 GPUs, try this branch and you can find discussions in some issues:
Hi @Kingmeng-Stack , support for FlashAttention was initially added for MI200 and MI300 (i.e. CDNA 2/3 accelerators), but the some of the underlying composable kernel backend includes CDNA specific assembly resulting in those error: invalid operand for instruction [...]
warnings.
There's an effort to ship an RDNA-compatible Triton kernel backend as an alternative to CK so that we can support FA on your device, but it's a work in progress and currently attempting to be upstreamed in FlashAttention.
Thanks to @evshiron for chiming in. I did not have any luck using that navi_support branch, but I believe the AOTriton implementation for FA does work.
Hi @jamesxu2, I am trying to build FA2 using RDNA 3 GPU architecture, can you guide which files in the composable kernel have CDNA specific instructions, so that I will try to build fa2.
@gowthamtupili, you can see from the error in the original issue that this file /[...]/composable_kernel/include/ck_tile/core/numeric/bfloat16.hpp
is named as having error: invalid operand for instruction
. I am not sure how you might find all files that include inline assembly which contain code that doesn't comply with the RDNA ISA. Our current implementation of FA relies on ck_tile, a subcomponent of composable kernel, that is simply neither designed for nor tested with RDNA, and it would be a significant undertaking to make it work for RDNA, if not a rewrite.
Further to that, I'm not sure how you plan to build FA without that inline assembly, unless you're able to translate it yourself into some RDNA-supporting equivalent.
@gowthamtupili
You can find relevant fused kernels here:
Examples with wmma
are existing implementations to support Navi 3x GPUs in ROCm/flash-attention@howiejay/navi_support. The xdl
variants are written for CDNA and far more complete. Please note that CK is a template library and it can be a pain for unseasoned developers.
There is another Flash Attention implementation written in rocWMMA which works on Navi 3x GPUs too:
Hi @jamesxu2 , @evshiron, Thank you for the inputs, I now have a much clearer path forward with building FA2 on RDNA 3. Thanks again for the support
@gowthamtupili
You can find relevant fused kernels here:
Examples with
wmma
are existing implementations to support Navi 3x GPUs in ROCm/flash-attention@howiejay/navi_support. Thexdl
variants are written for CDNA and far more complete. Please note that CK is a template library and it can be a pain for unseasoned developers.There is another Flash Attention implementation written in rocWMMA which works on Navi 3x GPUs too:
@evshiron Hi,Can I install both of these together?
Problem Description
Problem Description
When trying to build Flash Attention from source on ROCm platform, the compilation fails with an invalid assembly instruction error. The specific error occurs in the bfloat16 implementation.
Environment
Error Details
The compilation fails with the following error:
First tried to use the precompiled wheel, but it wasn't available:
The build from source then failed with the assembly instruction error mentioned above.
download file error, The file does not exist. This is the URL of the file.