Enable relative positional embedding in flash attention

Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging

Apache License 2.0

5.81k stars 1.07k forks source link

From reading this thread: https://github.com/pytorch/pytorch/issues/96099#issuecomment-1480430583 It seems to me that the relative positional embedding can be integrated with scaled_dot_product_attention 's attn_mask argument. However, it can be slow as it's not taking the "fast path".

Do you think we can keep this option open for users who wants to use flash_attention and rel_pos_embedding?

_Originally posted by @mingxin-zheng in https://github.com/Project-MONAI/MONAI/pull/7977#discussion_r1701825032_

Project-MONAI / MONAI

Enable relative positional embedding in flash attention #7997