bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2
Other
1.3k stars 211 forks source link

Add FlashAttention #357

Open NouamaneTazi opened 1 year ago

NouamaneTazi commented 1 year ago

This PR aims to add an option to use FlashAttention. Inspired by https://github.com/NVIDIA/Megatron-LM/pull/267

cc @thomasw21

mayank31398 commented 1 year ago

Aah here we go. Is flash attention merged into the original repo? I saw Tri Dao had opened a PR

NouamaneTazi commented 1 year ago

It's not merged yet to Megatron-LM. But I hope it will soon be :)

wywy136 commented 10 months ago

@NouamaneTazi Hi. Have you tested the correctness of this implementation of Flash Attention?