Open miraodasilva opened 3 weeks ago
Thanks for the interest. We have a PR for our large scale work that includes flash attention: https://github.com/espnet/espnet/pull/5537
You can check the changes in espnet/nets/pytorch_backend/transformer/attention.py
. If you're lazy, you can probably just copy paste this entire file alone and set use_flash_attention
to default to true.
You will need to install flash attention from https://github.com/Dao-AILab/flash-attention somehow to use it, so it has the same constraints (half precision, specific GPUs, etc).
Thanks for the request. I have used flash attention when training our open whisper-style speech models (OWSM), whose config file has those parameters. I will add flash attention in the main branch soon.
My branch for OWSM training can be found here: https://github.com/pyf98/espnet/tree/owsm-train
"We do not have so many requests, actually. We also have some internal discussions, but there are a lot of alternatives for the faster (lightweight) encoder and Squeezeformer does not come to a higher priority. For example, we'll soon add flash attention. Fnet is also ready."
Originally posted by @sw005320 in https://github.com/espnet/espnet/issues/4956#issuecomment-1746790382
Hi, I stumbled across this issue and was wondering, has flash attention been added to espnet? I was looking around the code and could only find a reference to it in one .yaml config but not in the code itself. Thanks :)