Details about the traning

MendelXu commented 3 years ago

Nice job. I am trying to reproduce your work with mmdetection and before it could you help me to confirm some details?
1) The input scale of the training images. Does line https://github.com/facebookresearch/unbiased-teacher/blob/6977c6f77c812fae4064dc1b3865658c2ed247b1/configs/Base-RCNN-FPN.yaml#L41 indicate the size of the input image is selected from those scales randomly? 2) The total batch size for training. The batch size is 32 labeled images + 32 unlabeled images. And For each iteration, 96 images( 32 strong augmented labeled images + 32 weakly augmented labeled images + 32 strong augmented unlabeled images) are used for supervision, right?

ycliu93 commented 3 years ago

We followed the original implementation in Detectron2. Here is their response for MIN_SIZE_TRAIN. https://github.com/facebookresearch/detectron2/issues/2216
We used 32 labeled images for computing the supervised loss and fed 32 unlabeled strongly augmented images into Student and 32 identical weakly augmented images into Teacher. In the SoftTeacher, you seem using 8 labeled data + 32 strongly augmented unlabeled data + 32 weakly augmented unlabeled data?

I am also trying to compare the implementation of Unbiased Teacher and SoftTeacher, and see where the improvement comes from.

I guess for a fair comparison (under the same batch size, same data augmentation technique, and other trivial implementation details), you could just change the background confidence loss to Focal loss and remove the unsupervised regression loss in the SoftTeacher codebase.

I didn't see the comparison between Focal loss and background confidence loss in your paper. Do you know how much improvement it contributes?

MendelXu commented 3 years ago

Thanks for your reply. I have tried native Focal loss but the result is quite weird. I will try to replace the roi head with yours directly.

ycliu93 commented 3 years ago

Hi @MendelXu ,

I'm tracing your SoftTeacher code and trying to understand the background confidence loss. https://github.com/microsoft/SoftTeacher/blob/main/ssod/models/soft_teacher.py#L232-L243

Could I interpret as applying a Focal loss on the student's predicted background samples, while the confidence is from the Teacher rather than the Student?

MendelXu commented 3 years ago

I think it is just a weighting mechanism that is opposite to focal loss (it intends to ignore some hard samples). And the confidence is evaluated on weak augmented samples, which is easier for recognition and might be more accurate.

ycliu93 commented 3 years ago

Got it. Did you try to apply the teacher's predicted weight to foreground samples before?

MendelXu commented 3 years ago

Yes. We have tried to apply the weight to all samples but the improvement is marginal compared to only apply to the background part.

facebookresearch / unbiased-teacher

Details about the traning #49