Fused Attention Kernel with gfx1030?

onesnep commented 11 months ago

I was glad to see Flash Attention ported to ROCM, however currently compatibility is limited to gfx90a. I and many others would love to see this on other architectures.

When building Composable Kernel against a gfx1030 target I noticed that the fused attention examples were removed from the test cases. The docs briefly mentioned partial compatibility for gfx1030, but I couldn't find concrete details about the differences in operator support between architectures.

I would appreciate clarification on whether a fused kernel suitable for Flash Attention would be possible on other architectures such as gfx1030 or even gfx1100, and if so, whether this is in the pipeline or else left to the community to implement.

Many thanks

ThePerfectComputer commented 6 months ago

I'm also curious about this - specifically about support for the gfx906 architecture.

linchen111 commented 5 days ago

I'm also curious about this - specifically about support for the gfx906 architecture.我对此也很好奇——特别是对 gfx906 架构的支持。

hello, I am curious about gfx906 too, did you have it already

ROCm / composable_kernel

Fused Attention Kernel with gfx1030? #886