Open YoelShoshan opened 1 year ago
Hi, This is not something supported efficiently by xFormers at the moment. You will need to create a custom kernel to do that. It might be similar to what was done for SAM https://pytorch.org/blog/accelerating-generative-ai/ (See section "Triton: Custom SDPA for fused relative positional encoding")
@YoelShoshan Hi, have you solved it? I also have similar issues. BTW, I just wonder will it be very difficult to support custom attn_bias? @danthe3rd Thanks in advance.
We have no plans to support it for now
I'm converting hf transformers T5 to use memory_efficient_attention()
Reached a point that I'm getting identical results between the original implementation and when using
memory_efficient_attention()
, however, I passattn_bias
as a Tensor, which I'm aware isn't optimized and results in FLASH implementation not being selected.So I started trying to convert the code to use AttentionBias
In the original T5 code, the default positional embedding is relative, and it is injected into the attention mechanism by adding it to the same place that
attn_bias
is being added (on Q@K output, before the softmax) However, it's not clear if it's possible to support this while using AttentionBias inherited class (or creating a new one) Any help on this will be highly appreciated :)Note - I understand that there are alternative positional encoding methods, but I need the code to support existing model weights as well, which do rely on that specific positional encoding method.