Closed dyh127 closed 11 months ago
Hi @dyh127,
Your concern is that for one slot, the feature may also come from pixels that correspond to other slots. Note that although we use soft masks for pooling the pixels into slots, the masks are actually quite sharp (close to one-hot), so the feature of one slot is quite consistent (if not, the similarity/attention value for that pixel would be small). If you find this is an issue, you can try a sharper temperature value (currently 0.07) for the soft masks, or maybe you can try using hard masks directly (gumbel softmax is also needed in this case).
Hi Xin,
Thanks for the great and insightful work.
When I read the code, I am confused by the label generation for contrastive learning of slots. As shown in https://github.com/CVMI-Lab/SlotCon/blob/main/models/slotcon.py#L186, the slots with the same indexes are viewed as positive indexes while I find that these slots are generated by masked pooling from features and indexes maybe not be related to the semantic classes. Maybe I have missed something.
Looking forward to your rely!