Disguiser15 / RefTeacher

RefTeacher is a strong baseline method for Semi-Supervised Referring Expression Comprehension.
Apache License 2.0
12 stars 0 forks source link

In regard to the semi-supervised attention issue #2

Open WeitaiKang opened 1 year ago

WeitaiKang commented 1 year ago

Thank you for your work, it's very interesting, but I have a question.

Is it okay to perform semi-supervised attention when your teacher's data augmentation and your student's data augmentation are not the same, and the attention map has not undergone geometric transformation?

Disguiser15 commented 1 year ago

Thank you for your interest and the good question! The teacher network is weakly augmented to provide stable pseudo labels, and the student network is strongly augmented to learn additional valuable information and prevent overfitting. If both employ the same data augmentation, the loss of attention constraints is small and the student network gradient is barely updated. You can see the ablation study in Table. 4. The attention map without geometric transformation can also provide valuable pseudo information.

WeitaiKang commented 1 year ago

It's great to see such a prompt response from you!

However, it doesn't seem quite reasonable to use L2 loss to constrain attention when teacher-student models are employing different augmentations. After all, strong augmentations like RandomResize and RandomSizeCrop can change the object's position, So in the attention map, it seems that there is no one-to-one correspondence between the tokens from the teacher and the student anymore. Without additional geometric transformations, why would we use L2 to constrain these two types of maps?

Is there any additional processing applied to the attention map of the data generated by the teacher's weak augmentations in this context?