Closed JoyHuYY1412 closed 2 years ago
Thank you for your reply!
So when armed with strong augmentation, does moving average teacher help? I saw you used the EMA teacher in the overall pipeline, have you compared the results with the simple copied and detached teacher?
Sorry, we did not ablate EMA since it is a common practice in semi-supervised semantic segmentation. However, in semi-supervised image classification, FixMatch utlizes a copy of the student as the teacher instead of EMA.
From my perspective, EMA might be not so important for semi-supervised learning, but it is important for contrastive learning (please refer to MoCo). And since our work utlizes an extra contrastive loss, EMA might be indispensable in our framework.
Thank you for your thoughts. They are very helpful and I hope to discuss more with you in the future.
Please do not hesitate to contact me if you have further questions~
Please do not hesitate to contact me if you have further questions~
Hi Haochen, thank you for your previous help!
Under the sup-only setting, I found the result in 1/8 CPS split for Pascal VOC is 74.56%, which is quite high. I didn't change the sup-only code, except for changing the data path.
Besides, when I remove all the strong augmentation and use mean-teacher for the unsupervised branch, the result only reaches 74.32% in early epochs and soon decreases.
Both results quite confuse me. The sup-only baselines seem to be higher than I expected and MT seems not to work for the semi-supervised segmentation tasks. Could you please give me some advice? I really appreciate your reply.
In my opinion, if you want to verify the effectiveness of EMA in semi-supervised semantic segmentation, it may be better to compare results between CutMux (w/ or w/o EMA), instead of MT and sup-only. This is because strong augmentation is quite significant in semi-supervised semantic segmentation to prevent collapse.
As you have mentioned
When I remove all the strong augmentation and use mean-teacher for the unsupervised branch, the result only reaches 74.32% in early epochs and soon decreases.
Reasons for performance degradation might be fitting to incorrect pseudo-labels without strong data augmentation.
Recall the weak-to-strong pipeline, images are first fed into the teacher for pseudo-labels, and then images and generated pseudo-labels are jointly used to produce strong augmented data.
I try to explain why strong augmentation is efficiency against wrong pseudo-labels in the following sentance. The EMA teacher can not provide satisfactory pseudo-labels for the strong augmented samples, but we are urging the student to produce high-quality ones (these pseudo-ground-truths are generated under a weakly augmented manner).
By the way, may be the momentum coefficient can be further tuned.
Hi Haochen, I have several general questions about your ablation.
According to table 2 and table 3, it seems that MT ( mean teacher) is not helpful except in very data-scarce scenarios (1/16 split). Can you give me some hints about these results? As we know, MT is very useful in classification tasks, so I feel a little confused.
I compared the Sup-only results with U2PL and AEL, it seems under 1/16 and 1/8 cases the results differ a lot. I think I will use your baselines for comparisons, and could you please tell me do you also use OHEM loss in cityscapes sup-only case? It will be helpful for me.
Thank you so much!