Closed 1173206772 closed 1 year ago
I appreciate your interest in our work. We include an attention module to instruct the model to focus more on the class-relevant information. To find the attention feature(only class relevant) we first find Va(masked support feature following Conv layer to get same resolution representing only class-relevant information). We map the produced Va on F^23 to get the final attention feature. In the prototype vector, the masked support feature is squeezed to one dimension using MAP operation. I hope that answers your question You can replace Va with the support prototype vector(make it first to same resolution) to find the attention feature and check the performance difference.
Sorry, I didn't notice your reply before.
You said Va(masked support feature following Conv layer to get same resolution representing only class-relevant information),but by the code att = F.adaptive_avg_pool2d(self.mask(Fs, Ys), output_size=(1, 1))
we will get the tensor with shape [B,C,1,1],right? In this way , how can I get the same resolution representing only class-relevant information?
Thank you for the clarification. Yes, you are right the average pooling output is (1,1), which produces the same resolution of attention feature after multiplying with F^23. following forward function. https://github.com/AIVResearch/MSANet/blob/c19e01cbde6ef6ae30dfc1f0bb1dcd3407ddbfff/model/MSANet.py#L70
You can try to use prototype vector and can check the performance difference. Thank you
OK, thanks for your reply.
Thanks for your great work. You design the attention module ,in this module, it gets the the attention vector Va and then you simpliy do a Hadamard product with the feature map to get a attention map Ma.Right? I don't understand the difference between the Va obtained in this way and the support protocal vector , in other words, can I easily use the protocal vector instead of your attention vector Va to get the attention map by Hadamard product with the feature map F^23? Can you explain it to me?
Best wishes.