some problems with the code

lucienne999 commented 2 years ago

Hi, this work is quite interesting:). But I have some problems with the code:

1) According to the paper, STUD use rois of key frame(t0) to generate positive samples and take rois of other frames(t1-tT) as negative samples (filtered via Energy Function ). But the label makes me confused[1], why it should be half negative and half positive? And What if key frame image only have one object ?

2) Why use rois of proposals, can I use GT BBOX rois?

3) Is there[2] missing negative sign?

[1] https://github.com/deeplearning-wisc/stud/blob/main/src/modeling/meta_arch/rcnn_ss_add.py#L228 [2] https://github.com/deeplearning-wisc/stud/blob/b667a369e368181ef6e913c32f26e574bead9b56/src/modeling/self_supervised/cycle_energy_direct_add_all.py#L139

d12306 commented 2 years ago

hi, @LicharYuan , thanks for your interest in our paper.

1) it is because the number of unknowns is the same as that of the positive in-distribution samples. see https://github.com/deeplearning-wisc/stud/blob/b667a369e368181ef6e913c32f26e574bead9b56/src/modeling/self_supervised/cycle_energy_direct_add_all.py#L171 2) you can try that real quick, but the diversity of the proposals will be lower because you actually explore a smaller number of proposals/objects to distill the unknowns. 3) actually here we use the negative energy score, which is larger for in-distribution objects. Since here we set the label for the id objects to be 1 (bigger than 0 for ood objects), i think it will be easier for the model to optimize. you can definitely try the negative sign here, which does not change the goal of binary separation between id and ood objects (the label for id objects can also be 0 if you'd like it to be).

lucienne999 commented 2 years ago

Thanks for your reply! But about 1), how you make sure the number of negative samples is equal to positive samples since the negative samples is chosen from multi-frames rois?

d12306 commented 2 years ago

hi, @LicharYuan , sorry for the late reply! i just see the comments. oh, actually for each id object, you kind of aggregate the negative samples from multi-frames rois using a dissimilarity score, so you essentially compress the dimension of the negative samples to be 1, which is equal to the dimension of id object.

feel free to post more questions here

deeplearning-wisc / stud

some problems with the code #2