Open mohamedelbahnasawi opened 1 year ago
Hi @mohamedelbahnasawi,
have you made any progress on this task? I am planning to implement a reversible dilated Encoder model. Let me know if you wanna collaborate!
Kind regards!
Hi @Coluding,
Unfortunately I stopped till I find a solution as I tried several times but there was always some problems with the masking. For sure, It would be nice if we can collaborate on this implementation and try to solve this problem.
I also hope if @fkodom can give us some tips on how to solve the problem with scaled dot product instead of flash attention.
Best regards, Mohamed
@mohamedelbahnasawi Are you trying to build an encoder-only model (e.g. BERT)? I don't have an immediate solution for the padding masks in that case, but I can look into it further. For causal decoder-only models (e.g. GPT), it likely doesn't matter -- you only need to apply masking to the loss function.
Hi @fkodom,
Thank you for replying, I am actually trying to build an Encoder-Decoder model just like the vanilla transformer architecture but with dilated attention.
@mohamedelbahnasawi Got it -- I'll take a look. I believe the fix should be here. xops
also allows you to pass a Tensor
mask, in place of the LowerTriangularMask
I was lazily using. So we'll need to add an attn_mask: Optional[Tensor] = None
argument to forward()
for each module.
Peeking at the xformers
docstring:
So it may be slower, but more helpful for your use case.
Hi @fkodom,
I really like your implementation and I wanted to use dilated attention into a vanilla transformer model to try how things work.
Right now, I am facing a problem during the attention calculation in which you use flash attention because they do not include a way to provide padding mask. For the scaled dot product, I am not sure If the masks also should be segmented and sparsified. Do you have an idea how calculated the attention using the scaled dot product taking into consideration the padding mask for encoder mask and padding and causal mask together for the decoder mask?
Thanks for help!