Closed DevinCheung closed 11 months ago
Hi @DevinCheung, thanks for your attention to our work.
Hi @xwen99 , thanks for your quick reply! For the second question, I may not put it quite clearly.
I mean, for example, I have two crops of one image. The two crops are of the same ratio corresponding to the raw image (i.e. no resize operation). Also the two crops have overlaps which are essential for calculating L_{Group}. Then I do some gaussian blur, color jitter, etc on the two crops as the two views to be input into the network. In this way, no RoI_Align operation is needed.
Briefly speaking, resize augmentation is removed, and the others are maintained. Will this cause a significant performance drop?
Thanks!
Hi @DevinCheung, regarding your current question, the last part of sec. C in the appendix covers precisely the same setting, and please have a look. Briefly speaking, yes, the performance will drop significantly.
Hi Xin Wen,
Thanks for your great work! Regarding SlotCon, I have two questions: (1) I notice the prototypes are initialized with
nn.Embedding
. I am wondering how to ensure that the trainable prototypes are optimized to be meaningful semantic groups via backpropagation. Since the loss functions do not explicitly ensure this, I am a little bit confused about the optimization of prototypes. (2) Have you tried how the resize operation in the data augmentation matters? I mean, if you only do crop along with other augmentation, without resize operation, will the performance drop heavily?Thanks for your reply!