Closed pietruh closed 2 years ago
@madian9
Any update on this?
I would also be happy to know more about this!
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!
🐛 Bug
Hi guys! In Linformer's example source code, I found that operation order may not match the official paper mathematics.
Here, in the code, the linear attention is done in the following sequence of the two operations:
n
token's representations tok.
d_m
tod_k
). as here (#208 and #213 respectively)On the contrary, this image from the Linformer paper states that it should be performed in the order of:
d_m
tod_k
).n
token's representations tok
. As seen in the picture below:Am I missing something important here? If anything gets confirmed, I am up for fixing it.
Environment
Current fairseq version.