CVMI-Lab / SlotCon

(NeurIPS 2022) Self-Supervised Visual Representation Learning with Semantic Grouping
https://wen-xin.info/slotcon/
Apache License 2.0
95 stars 9 forks source link

About the prototypes #1

Closed DevinCheung closed 11 months ago

DevinCheung commented 2 years ago

Hi Xin Wen,

Thanks for your great work! Regarding SlotCon, I have two questions: (1) I notice the prototypes are initialized with nn.Embedding. I am wondering how to ensure that the trainable prototypes are optimized to be meaningful semantic groups via backpropagation. Since the loss functions do not explicitly ensure this, I am a little bit confused about the optimization of prototypes. (2) Have you tried how the resize operation in the data augmentation matters? I mean, if you only do crop along with other augmentation, without resize operation, will the performance drop heavily?

Thanks for your reply!

xwen99 commented 2 years ago

Hi @DevinCheung, thanks for your attention to our work.

DevinCheung commented 2 years ago

Hi @xwen99 , thanks for your quick reply! For the second question, I may not put it quite clearly.

I mean, for example, I have two crops of one image. The two crops are of the same ratio corresponding to the raw image (i.e. no resize operation). Also the two crops have overlaps which are essential for calculating L_{Group}. Then I do some gaussian blur, color jitter, etc on the two crops as the two views to be input into the network. In this way, no RoI_Align operation is needed.

Briefly speaking, resize augmentation is removed, and the others are maintained. Will this cause a significant performance drop?

Thanks!

xwen99 commented 2 years ago

Hi @DevinCheung, regarding your current question, the last part of sec. C in the appendix covers precisely the same setting, and please have a look. Briefly speaking, yes, the performance will drop significantly.