Closed mistycube closed 1 year ago
Is it possible in the flash attention interface to handle gated single head attention ? Maybe the speedup can be even higher.
Paper: https://arxiv.org/pdf/2202.10447.pdf
We don't have that out of the box. Feel free to play with the Triton implementation (it's a self-contained Python file).
Sure. Thanks for your prompt reply.
Is it possible in the flash attention interface to handle gated single head attention ? Maybe the speedup can be even higher.
Paper: https://arxiv.org/pdf/2202.10447.pdf