The sequence of operations in the Linformer Attention module is probably wrong.

pietruh commented 3 years ago

🐛 Bug

Hi guys! In Linformer's example source code, I found that operation order may not match the official paper mathematics.

Here, in the code, the linear attention is done in the following sequence of the two operations:

a Linear projection from n token's representations to k.
a Linear projection over the embedding dimension (d_m to d_k). as here (#208 and #213 respectively)

On the contrary, this image from the Linformer paper states that it should be performed in the order of:

a Linear projection over the embedding dimension (d_m to d_k).
a Linear projection from n token's representations to k. As seen in the picture below:

Am I missing something important here? If anything gets confirmed, I am up for fixing it.

Environment

Current fairseq version.

myleott commented 3 years ago

@madian9

patil-suraj commented 3 years ago

Any update on this?

lorelupo commented 3 years ago

I would also be happy to know more about this!

stale[bot] commented 2 years ago

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale[bot] commented 2 years ago

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

facebookresearch / fairseq

The sequence of operations in the Linformer Attention module is probably wrong. #2923

🐛 Bug

Environment