evshiron / rocm_lab

DEPRECATED!
https://are-we-gfx1100-yet.github.io
Other
53 stars 7 forks source link

what about ait #15

Open Boom-Hacker opened 1 year ago

Boom-Hacker commented 1 year ago

i only ran aitemplate in navi3_rel_ver_1.0,it is so old

evshiron commented 1 year ago

Yes, that branch is very old. I made random fixes while debugging and only managed to bring it to a point where it can achieve a score of 25it/s. According to reports, using this commit of ROCm LLVM can reach 30it/s.

The submodule in this branch is linked to the specified branch of Composable Kernel, which has a Fused Attention implementation for Navi 3x.

I spent a lot of time trying to integrate this Fused Attention into PyTorch before. And you can find my efforts here:

If you're interested, you can check out the repos in the org to conduct further research.