apple / ml-cvnets

CVNets: A library for training computer vision networks
https://apple.github.io/ml-cvnets
Other
1.79k stars 228 forks source link

#Operations for Self-Attention Layer #17

Closed mmaaz60 closed 2 years ago

mmaaz60 commented 2 years ago

Hi @sacmehta, Thanks for the great work

https://github.com/apple/ml-cvnets/blob/d38a116fe134a8cd5db18670764fdaafd39a5d4f/cvnets/layers/multi_head_attention.py#L125

# number of operations in QK^T
m_qk = (seq_len * in_channels * in_channels) * 

As per the code above, the MAdds for QK^T is calculated as L*C*C where L & C are sequence length and channels respectively. But as we know QK^T product actually involves calculating NxN gram matrix, means we need to compute NxN elements and the operations required for calculating each element would be C. So, shouldn't the MAdds be NxNxC instead?

Thanks, please correct me if I am wrong.

sacmehta commented 2 years ago

Hi @mmaaz60 , Thanks for noting it. We have fixed it.

Note that the FLOPs with the fixed equation are fewer (not significantly though) than the reported number because in_channels is greater than seq_len for MobileViT.