Open yoon5862 opened 1 month ago
And CUTLASS said fused multi head attention examples is same as flash attention-2.
I believe those are not the same thing. Where did you see that?
Flash-Attention 2 is built using the CUTLASS library, but what we call "cutlass" implementation in xFormers, and what is in cutlass/examples
is something else.
thank you for relpy. In CUTLASS examples, is said it's code is upstream to xformers.
Acknowledgement: Fixed-sequence-length FMHA code was upstreamed by Meta xFormers (https://github.com/facebookresearch/xformers).
therefore I think xformers use cutlass custom kernel and tuned it's kernels for oracle setting for kernel.
❓ Questions and Help
Hello, I am watching fused multi-head attention in 3rdparty/cutlass. In cutlass/examples, fused multi head attention is upstream to xformers. And CUTLASS said fused multi head attention examples is same as flash attention-2. Is it true that cutlass fused multi head attention and flash attention-2 kernel is same things? Thank you.