Closed creatorrr closed 1 month ago
Flash attention 3 is in pre release still. There is no actual release tagged and there are nowhere built for it. It currently only supports fp16 for the h100 optimizations and almost nobody uses fp16. The flash attention builds takes several hours so it's probably to something we'll be tackling u til the wheels are officially supported upstream.
Makes sense. Closing this for now so that other people find it if they are looking for this.
⚠️ Please check that this feature request hasn't been suggested before.
🔖 Feature description
Add support for Flash Attention 3
✔️ Solution
Will it just be enough to bump the dependency? Or does any axolotl code depend on lower level flash-attn APIs?
https://github.com/Dao-AILab/flash-attention
Cc @winglian
❓ Alternatives
No response
📝 Additional Context
Acknowledgements