same idea found in cvpr2021

AndyYuan96 commented 3 years ago

3DIoUMatch。the core idea is the same，fixmatch+ema。

ycliu93 commented 3 years ago

Hi @AndyYuan96,

Our paper showed that simply using FixMatch-style training and EMA doesn't perform well, since there is a pseudo-labeling bias issue (i.e., only major class pseudo-labels are generated in the late stage of the semi-supervised training) caused by the class imbalance in the 2D object detection task.
More details are in section 3.3 of our paper: https://arxiv.org/abs/2102.09480

Using FixMatch + EMA will get the red curve in the following figure. Our method further addresses the pseudo-labeling bias issue and improves compared to the above method as the green curve shown in the figure.

Also, we submitted our paper to ICLR 2021 (Sept. 28, 2020), which is two months before the CVPR2021 submission deadline (Nov. 16, 2020). More review and response are shown in the OpenReview link: https://openreview.net/forum?id=MJIve1zgR_

AndyYuan96 commented 3 years ago

the input of roi head is the proposals, which are sampled from all proposals, which already alleviate the imbalance problem, why focal loss still give too much improvement? does change faster rcnn's rcnn stage's classification loss to focal loss give so much improvement?

AndyYuan96 commented 3 years ago

FixMatch-style training and EMA did improve compared with STAC，see cvpr2021 paper Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework， the author give a name to FixMatch-style training and EMA ， instant-teaching。

ycliu93 commented 3 years ago

Thanks for your questions.

Q: Doesn't the proposal sampling already address the imbalance issue? Does changing Faster-RCNN classification loss improve a lot? A: Note that rcnn proposal resampling only balances foreground-background instances in the training, while in pseudo-labeling bias issue, there are imbalance within the foreground classes and imbalance between foreground-background. For all experiments we presented in the paper (including STAC and all baselines), we have already used the proposal sampling, but the pseudo-labeling bias issue still exists.

And this is why when we replace cross-entropy with multi-class focal loss, it can improve a lot as shown in the following figure. We have released the code, and you could try to modify the loss to cross-entropy to see the results. The model with cross-entropy+EMA gets the highest mAP around 50K iterations and then start to degrade the mAP. And if you print the class distribution of pseudo-labels, you will find out the model only generate the pseudo-labels for the majority (e.g., human). And that is the pseudo-labeling bias issue.

Q: Fixmatch-style + EMA perform better than STAC? A: FixMatch+EMA (red curve; cross-entropy+EMA) can perform around 16 mAP, and I believe that is close to what instant-teaching (the model without co-rectify scheme) presented in their paper. It definitely performs better than STAC, but compared to our final model it still has some gaps to our model, which addresses the pseudo-labeling bias issue.

Also, I would like to emphasize again that we submitted our paper to ICLR 2021 (Sept. 28, 2020), which is two months before the CVPR2021 submission deadline (Nov. 16, 2020).

AndyYuan96 commented 3 years ago

Thanks for your questions.

Q: Doesn't the proposal sampling already address the imbalance issue? Does changing Faster-RCNN classification loss improve a lot? A: Note that rcnn proposal resampling only balances foreground-background instances in the training, while in pseudo-labeling bias issue, there are imbalance within the foreground classes and imbalance between foreground-background. For all experiments we presented in the paper (including STAC and all baselines), we have already used the proposal sampling, but the pseudo-labeling bias issue still exists.

And this is why when we replace cross-entropy with multi-class focal loss, it can improve a lot as shown in the following figure. We have released the code, and you could try to modify the loss to cross-entropy to see the results. The model with cross-entropy+EMA gets the highest mAP around 50K iterations and then start to degrade the mAP. And if you print the class distribution of pseudo-labels, you will find out the model only generate the pseudo-labels for the majority (e.g., human). And that is the pseudo-labeling bias issue.

Q: Fixmatch-style + EMA perform better than STAC? A: FixMatch+EMA (red curve; cross-entropy+EMA) can perform around 16 mAP, and I believe that is close to what instant-teaching (the model without co-rectify scheme) presented in their paper. It definitely performs better than STAC, but compared to our final model it still has some gaps to our model, which addresses the pseudo-labeling bias issue.

Also, I would like to emphasize again that we submitted our paper to ICLR 2021 (Sept. 28, 2020), which is two months before the CVPR2021 submission deadline (Nov. 16, 2020).

thank you for reply，I will try this in my project，compare with fixmatch+ema。 what's more，as I‘m new to semi supervised learning，and I apply semi supervised learning on 3d object detection，the two paper in 3d detection say that ， for a dataset，they use 100% training dataset as labeled data，and again use 100% training data as unlabel data at the same time，they all say they have improvement in validation dataset，but I can't reproduce the result with they code，do you think that use same data as labeled data and unlabeled data is resonable？as I see in image field，they don't use that data config。

ycliu93 commented 3 years ago

My personal opinion is that once you have the ground-truth labels for the supervised image set (assume the ground-truth labels don't have any noise), then you should not use generated pseudo-labels for that supervised image set as extra supervision for training the model. The ground-truth labels are the oracle supervision, and you should not use noisy pseudo-labels to mess up the training.

But if you want to further improve the model trained with 100% supervised dataset with extra unlabeled dataset (not included in the supervised set), then it is possible.

AndyYuan96 commented 3 years ago

My personal opinion is that once you have the ground-truth labels for the supervised image set (assume the ground-truth labels don't have any noise), then you should not use generated pseudo-labels for that supervised image set as extra supervision for training the model. The ground-truth labels are the oracle supervision, and you should not use noisy pseudo-labels to mess up the training.

But if you want to further improve the model trained with 100% supervised dataset with extra unlabeled dataset (not included in the supervised set), then it is possible.

thanks。I also think so，so it‘s really 玄学 that why they say that data config can give improvement。

facebookresearch / unbiased-teacher

same idea found in cvpr2021 #17