Add FlashAttention - Githubissues

bigscience-workshop / Megatron-DeepSpeed

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Other

1.3k stars 211 forks source link

Open NouamaneTazi opened 1 year ago

NouamaneTazi commented 1 year ago

cc @thomasw21

mayank31398 commented 1 year ago

Aah here we go. Is flash attention merged into the original repo? I saw Tri Dao had opened a PR

NouamaneTazi commented 1 year ago

It's not merged yet to Megatron-LM. But I hope it will soon be :)

wywy136 commented 10 months ago

@NouamaneTazi Hi. Have you tested the correctness of this implementation of Flash Attention?