AIVResearch / MSANet

Official Pytorch implementation of Multi-Similarity and Attention Guidence for Boosting Few-Shot Segmentation.
https://arxiv.org/abs/2206.09667v1
63 stars 12 forks source link

About attention module #6

Closed 1173206772 closed 1 year ago

1173206772 commented 1 year ago

Thanks for your great work. You design the attention module ,in this module, it gets the the attention vector Va and then you simpliy do a Hadamard product with the feature map to get a attention map Ma.Right? I don't understand the difference between the Va obtained in this way and the support protocal vector , in other words, can I easily use the protocal vector instead of your attention vector Va to get the attention map by Hadamard product with the feature map F^23? Can you explain it to me?

Best wishes.

Ehteshamciitwah commented 1 year ago

I appreciate your interest in our work. We include an attention module to instruct the model to focus more on the class-relevant information. To find the attention feature(only class relevant) we first find Va(masked support feature following Conv layer to get same resolution representing only class-relevant information). We map the produced Va on F^23 to get the final attention feature. In the prototype vector, the masked support feature is squeezed to one dimension using MAP operation. I hope that answers your question You can replace Va with the support prototype vector(make it first to same resolution) to find the attention feature and check the performance difference.

1173206772 commented 1 year ago

Sorry, I didn't notice your reply before. You said Va(masked support feature following Conv layer to get same resolution representing only class-relevant information),but by the code att = F.adaptive_avg_pool2d(self.mask(Fs, Ys), output_size=(1, 1)) we will get the tensor with shape [B,C,1,1],right? In this way , how can I get the same resolution representing only class-relevant information?

Ehteshamciitwah commented 1 year ago

Thank you for the clarification. Yes, you are right the average pooling output is (1,1), which produces the same resolution of attention feature after multiplying with F^23. following forward function. https://github.com/AIVResearch/MSANet/blob/c19e01cbde6ef6ae30dfc1f0bb1dcd3407ddbfff/model/MSANet.py#L70

You can try to use prototype vector and can check the performance difference. Thank you

1173206772 commented 1 year ago

OK, thanks for your reply.