Include SageAttention Kernel

jason-huang03 commented 1 week ago

Hi thanks for your great work. I am the author of SageAttention. I wonder whether if it is suitable to include SageAttention into the repo. SageAttention is a quantized attention currently optmized for Ada architecture. It supports qk int8 and pv fp8. It will be easy to support qk fp8 too.

jason-huang03 commented 1 week ago

Also I myself have a well optimized W8A8 GEMM kernel that reaches > 500T on 4090. I wonder whether it is suitable to add this into the repo too.

DefTruth commented 1 week ago

@jason-huang03 🎉Hi jason-huang03~ thank you very much for your attention to CUDA-Learn-Notes. SageAttention is an excellent work and also a great learning resource. It would be fantastic if it could be integrated into CUDA-Learn-Notes. Please feel free to submit a PR. You can add it to the kernels/sage-attention directory, referring to hgemm as an example. If you can integrate sage-attention into a library, like toy-hgemm library, that would be even better. After the PR is merged, I will pin your work on the README homepage. Thank you very much~

jason-huang03 commented 1 week ago

Sounds Great! I will try to figure out how to do that.

DefTruth / CUDA-Learn-Notes

Include SageAttention Kernel #147