is Flash Attention a requirement?

catie-aq / flashT5

A fast implementation of T5/UL2 in PyTorch using Flash Attention

Apache License 2.0

59 stars 7 forks source link

Hi. First of all, thank you so much for this awesome work. I really want to try it out but the problem is that I use a few V100 cards and unfortunately they don't support Flash Attention 2. so I was wondering if I should, nonetheless, try using this repo?

tbh I don't really care that much about Flash Attention, I just needed a good starting point to train the large or the XL variant of the model in torch from scratch. most scripts I see use Jax or TPUs or are very hard-coded, thus difficult to work with.

So, I want to know whether FA2 is a requirement. Thanks

catie-aq / flashT5

is Flash Attention a requirement? #4