Closed mmaaz60 closed 2 years ago
Hi @mmaaz60 , Thanks for noting it. We have fixed it.
Note that the FLOPs with the fixed equation are fewer (not significantly though) than the reported number because in_channels
is greater than seq_len
for MobileViT.
Hi @sacmehta, Thanks for the great work
https://github.com/apple/ml-cvnets/blob/d38a116fe134a8cd5db18670764fdaafd39a5d4f/cvnets/layers/multi_head_attention.py#L125
As per the code above, the MAdds for
QK^T
is calculated asL*C*C
whereL
&C
are sequence length and channels respectively. But as we knowQK^T
product actually involves calculatingNxN
gram matrix, means we need to computeNxN
elements and the operations required for calculating each element would beC
. So, shouldn't the MAdds beNxNxC
instead?Thanks, please correct me if I am wrong.