what about ait - Githubissues

Yes, that branch is very old. I made random fixes while debugging and only managed to bring it to a point where it can achieve a score of 25it/s. According to reports, using this commit of ROCm LLVM can reach 30it/s.

The submodule in this branch is linked to the specified branch of Composable Kernel, which has a Fused Attention implementation for Navi 3x.

I spent a lot of time trying to integrate this Fused Attention into PyTorch before. And you can find my efforts here:

https://github.com/orgs/are-we-gfx1100-yet/repositories

If you're interested, you can check out the repos in the org to conduct further research.

evshiron / rocm_lab

what about ait #15