Flash attention? - Githubissues

espnet / espnet

End-to-End Speech Processing Toolkit

https://espnet.github.io/espnet/

Apache License 2.0

8.07k stars 2.12k forks source link

Flash attention? #5806

Open miraodasilva opened 3 weeks ago

miraodasilva commented 3 weeks ago

"We do not have so many requests, actually. We also have some internal discussions, but there are a lot of alternatives for the faster (lightweight) encoder and Squeezeformer does not come to a higher priority. For example, we'll soon add flash attention. Fnet is also ready."

Originally posted by @sw005320 in https://github.com/espnet/espnet/issues/4956#issuecomment-1746790382

Hi, I stumbled across this issue and was wondering, has flash attention been added to espnet? I was looking around the code and could only find a reference to it in one .yaml config but not in the code itself. Thanks :)

wanchichen commented 3 weeks ago

Thanks for the interest. We have a PR for our large scale work that includes flash attention: https://github.com/espnet/espnet/pull/5537

You can check the changes in espnet/nets/pytorch_backend/transformer/attention.py. If you're lazy, you can probably just copy paste this entire file alone and set use_flash_attention to default to true.

You will need to install flash attention from https://github.com/Dao-AILab/flash-attention somehow to use it, so it has the same constraints (half precision, specific GPUs, etc).

pyf98 commented 3 weeks ago

Thanks for the request. I have used flash attention when training our open whisper-style speech models (OWSM), whose config file has those parameters. I will add flash attention in the main branch soon.

My branch for OWSM training can be found here: https://github.com/pyf98/espnet/tree/owsm-train