Open buptxyb666 opened 2 months ago
Thanks for your great work!
I wonder that the length of text usually less than 77, so why not mask the padding tokens in word_emb when performing cross attention?
Thanks for your great work!
I wonder that the length of text usually less than 77, so why not mask the padding tokens in word_emb when performing cross attention?