Open night3759 opened 1 year ago
Hi, thanks for your interest in the work, and sorry for the late reply.... Eqn(6) is not used as it was replaced by Eqn(7) in the final implementation. However, it is easy to reproduce simply by manipulating the original SA module by swapping the dimension of the formula of self-attention calculation from NC x CN --> CN x NC and then matmal with CN dimensioned v vector.
I hope this clarifies your questions. If you have any other questions, do drop me a message.
Thank you for you great job. When I read the paper, I am confused about the "Attentional feature transformation" part. I don't know why the channel dimension attention can stand for feature covariance. And I didn't find the relating code about equation (6). I look forward to receiving your reply. Thank you.