Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.26k stars 1.2k forks source link

using custom attention mask #584

Open whatisslove11 opened 11 months ago

whatisslove11 commented 11 months ago

Hello! I am doing a translation task and would like to try using flash attention in my model In addition to the usual triangular mask, I also need to mask padding tokens so that the model does not pay attention to them - sequences of the same length already arrive in the model itself As I understand it, there is no function of feeding your mask yet Could you tell me how I can use my mask or make flash attention add padding tokens to the mask itself?

whatisslove11 commented 11 months ago

@tridao

tridao commented 11 months ago

You can look at how we do it in BERT: Remove all padding tokens before the first layer.

Idk if that works for translation.

flower-with-safe commented 11 months ago

what if I want to use reset-attention-mask when pretrain a llama model? for example, my attention mask could be: tensor([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [1., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [1., 1., 1., 0., 0., 0., 0., 0., 0., 0.], [1., 1., 1., 1., 0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 1., 0., 0., 0.], [0., 0., 0., 0., 1., 1., 1., 1., 0., 0.], [0., 0., 0., 0., 1., 1., 1., 1., 1., 0.], [0., 0., 0., 0., 1., 1., 1., 1., 1., 1.]]) in such case how could I use flash attention? @tridao

sentialx commented 11 months ago

@tridao Is it possible to use a PrefixLM attention mask?

tridao commented 11 months ago

No that's not supported.

sentialx commented 10 months ago

It looks like custom attention mask is the most requested feature. In fact, it seems to be already solved by this PR: https://github.com/Dao-AILab/flash-attention/pull/57

iiLaurens commented 10 months ago

Am also much interested in custom masks. I think the value of Prefixlm mask is not appreciated enough. Would like to experiment continuation of pretraining with prefixlm a la UL2RUL2R.

@tridao, are you considering support for a custom attention masks? Or do you have specific objections to it?