Open jason-huang03 opened 1 week ago
Also I myself have a well optimized W8A8 GEMM kernel that reaches > 500T on 4090. I wonder whether it is suitable to add this into the repo too.
@jason-huang03
🎉Hi jason-huang03~ thank you very much for your attention to CUDA-Learn-Notes. SageAttention is an excellent work and also a great learning resource. It would be fantastic if it could be integrated into CUDA-Learn-Notes. Please feel free to submit a PR. You can add it to the kernels/sage-attention
directory, referring to hgemm as an example. If you can integrate sage-attention
into a library, like toy-hgemm library, that would be even better. After the PR is merged, I will pin your work on the README homepage. Thank you very much~
Sounds Great! I will try to figure out how to do that.
Hi thanks for your great work. I am the author of SageAttention. I wonder whether if it is suitable to include SageAttention into the repo. SageAttention is a quantized attention currently optmized for Ada architecture. It supports qk int8 and pv fp8. It will be easy to support qk fp8 too.