Closed Taytay closed 9 months ago
@Taytay Thanks for the nice comments, I'm glad you like the repo! Please accept my apologies for the late reply. I've been very busy lately with the ICML submission.
Yes, exactly. FA didn't support back propagation through the extra additive bias (after dot-products, before softmax). I've just noticed this PR, it looks great - I'm sure that backprop through these bias would help not only in the T5 case! Can't wait to have it merged to FA. I'll defo test it soon after it : ).
Closing for now
Someone has started a repo, based off of this one, with FA2 support @catie-aq
Firstly, thank you so much for this repo! I'm a huge fan of T5, and these results are extremely impressive.
I saw that you experimented with different positional embeddings like ALiBi in order to facilitate FA down the line. Was that attempt due to the fact that FA doesn't support bias? If so, there is a PR to add it that is making progress:
https://github.com/Dao-AILab/flash-attention/pull/617
It would be fun to see this repo get even faster.