cmsflash / efficient-attention

An implementation of the efficient attention module.
https://arxiv.org/abs/1812.01243
MIT License
276 stars 26 forks source link

Shape of attention map #12

Closed aarontyliu closed 1 year ago

aarontyliu commented 1 year ago

Hi, I found the attention map computed using the script is C x C. Shouldn't it be (H x W) x (H x W) if we want spatial attention?

Thank you for any information that you can provide.

cmsflash commented 1 year ago

The difference is exactly why efficient attention (EA) is more efficient than the conventional dot -product attention. For why it is still exactly or approximately (in different settings) to dot-product attention, please refer to Section 3.3 of the paper.