axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

Support for Flash Attention 3 #1751

Closed creatorrr closed 1 month ago

creatorrr commented 1 month ago

⚠️ Please check that this feature request hasn't been suggested before.

🔖 Feature description

Add support for Flash Attention 3

✔️ Solution

Will it just be enough to bump the dependency? Or does any axolotl code depend on lower level flash-attn APIs?

https://github.com/Dao-AILab/flash-attention

Cc @winglian

❓ Alternatives

No response

📝 Additional Context

The improvements from FlashAttention-3 will result in:

More efficient GPU Utilization: The new technique uses up to 75% of an H100 GPU’s maximum capabilities, up from just 35% before. This results in significantly (1.5-2x) faster than previous versions for training and running of large language models (LLMs).

Better performance with lower precision: FlashAttention-3 can work with lower precision numbers (FP8) while maintaining accuracy. This allows for even faster processing and potentially lower memory usage, which could lead to cost savings and improved efficiency for customers running large-scale AI operations.

Ability to use longer context in LLMs: By speeding up the attention mechanism, FlashAttention-3 enables AI models to work with much longer pieces of text more efficiently. This could allow for applications that can understand and generate longer, more complex content without slowing down.

Acknowledgements

winglian commented 1 month ago

Flash attention 3 is in pre release still. There is no actual release tagged and there are nowhere built for it. It currently only supports fp16 for the h100 optimizations and almost nobody uses fp16. The flash attention builds takes several hours so it's probably to something we'll be tackling u til the wheels are officially supported upstream.

creatorrr commented 1 month ago

Makes sense. Closing this for now so that other people find it if they are looking for this.