WenbinLee / DN4

Pytorch code of "Revisiting Local Descriptor based Image-to-Class Measure for Few-shot Learning", CVPR 2019.
Other
190 stars 43 forks source link

a question about ablation study #10

Closed icoz69 closed 5 years ago

icoz69 commented 5 years ago

hello, sorry for asking you here as i did not find the email in your paper. I have a question about an important experiment. Do you compare your network with the baseline that compute cosine distance after global pooling (as is done in prototypical network). That can be seen as to use 1 vector vs 1 vector to compute distance, and meanwhile yours is many vectors vs many vectors. I think the greatest novelty in your work is to use local features instead of global feature after pooling. However, a baseline model that remains all the parts the same except the global pooling added is missing. Did you have such comparison experiments?

WenbinLee commented 5 years ago

Thanks for your suggestion. Yes, it's an important comparison experiment. The reason why we didn't add this experiment in the original paper was that we thought the local-based Prototypical Net (feature dimensionality is 64) should be worse than the Global-based Prototypical Net (feature dimensionality is 1600).

Fortunately, we did this experiment recently. When doing a 5-way 5-shot task, the accuracy of using mean local features (i.e., Prototypical Net) is 67.15+0.65%. Hope this can help you. Also, you can use our code to easily implement this experiment.

BTW, in fact, one novelty is indeed the usage of local features without pooling , the other more important novelty is the image-to-class measure which can take full advantage of the local features because of the exchangeability.

WenbinLee commented 5 years ago

It is done on the miniImagenet dataset. Thank you.

icoz69 commented 5 years ago

Thanks for your suggestion. Yes, it's an important comparison experiment. The reason why we didn't add this experiment in the original paper was that we thought the local-based Prototypical Net (feature dimensionality is 64) should be worse than the Global-based Prototypical Net (feature dimensionality is 1600).

Fortunately, we did this experiment recently. When doing a 5-way 5-shot task, the accuracy of using mean local features (i.e., Prototypical Net) is 67.15+0.65%. Hope this can help you. Also, you can use our code to easily implement this experiment.

BTW, in fact, one novelty is indeed the usage of local features without pooling , the other more important novelty is the image-to-class measure which can take full advantage of the local features because of the exchangeability.

Thank you for your instant reply. This is an inspiring idea. However, I did not quite understand your reply. Why Prototypical Net has such a high dimensionality? Should't it has the same dim with local based, as it simply average all local features? Actually, what i want to ask is a baseline model that averages all local features and still use cosine distance, such that the only difference to yours is to average or not. If it is compared with prototypical directly, both the distance function and the local/global, changed and we do not know how much the locality works. Thank you.

WenbinLee commented 5 years ago

In section 3.2 of the original paper of Prototypical Net, you can find that the feature dimensionality is 1600 because they use global features 64x5x5=1600. Never mind, I just mentioned this is to explain why we didn't do this experiment in our original CVPR paper. Please ignore this part.

Yes, I mean I have done what you suggested. We can use global average pooling to get a 64-dimensional local feature vector for each class, and we still use cosine distance, then we can get an accuracy of 67.15+0.65% for the 5-way 5-shot setting on the miniImagenet dataset. But if you use our method, you can get 71.02+0.64%. Is it more clear?

Thank you.

icoz69 commented 5 years ago

Thank you for your reply. That sounds good. Does it also have advantages in 1shot 5way task?

WenbinLee commented 5 years ago

You are welcome. Yes, it does