The difference is exactly why efficient attention (EA) is more efficient than the conventional dot -product attention. For why it is still exactly or approximately (in different settings) to dot-product attention, please refer to Section 3.3 of the paper.
Hi, I found the attention map computed using the script is C x C. Shouldn't it be (H x W) x (H x W) if we want spatial attention?
Thank you for any information that you can provide.