facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.52k stars 608 forks source link

AttentionBias and relative positional encoding #925

Open YoelShoshan opened 11 months ago

YoelShoshan commented 11 months ago

I'm converting hf transformers T5 to use memory_efficient_attention()

Reached a point that I'm getting identical results between the original implementation and when using memory_efficient_attention(), however, I pass attn_bias as a Tensor, which I'm aware isn't optimized and results in FLASH implementation not being selected.

So I started trying to convert the code to use AttentionBias

In the original T5 code, the default positional embedding is relative, and it is injected into the attention mechanism by adding it to the same place that attn_bias is being added (on Q@K output, before the softmax) However, it's not clear if it's possible to support this while using AttentionBias inherited class (or creating a new one) Any help on this will be highly appreciated :)

Note - I understand that there are alternative positional encoding methods, but I need the code to support existing model weights as well, which do rely on that specific positional encoding method.

danthe3rd commented 11 months ago

Hi, This is not something supported efficiently by xFormers at the moment. You will need to create a custom kernel to do that. It might be similar to what was done for SAM https://pytorch.org/blog/accelerating-generative-ai/ (See section "Triton: Custom SDPA for fused relative positional encoding")

IceClear commented 8 months ago

@YoelShoshan Hi, have you solved it? I also have similar issues. BTW, I just wonder will it be very difficult to support custom attn_bias? @danthe3rd Thanks in advance.

danthe3rd commented 8 months ago

We have no plans to support it for now