CuriousAI / mean-teacher

A state-of-the-art semi-supervised method for image recognition
https://arxiv.org/abs/1703.01780
Other
1.56k stars 331 forks source link

Questions about your code #43

Open luciaL opened 4 years ago

luciaL commented 4 years ago

Hello, Why the models have two fc layers and two outputs? I don't think it's necessary. Consistency_loss can also be calculated by class_logit and ema_logit. What's the difference between class_logit and cons_logit?

dbjhbyun commented 4 years ago

I also had a similar question to this. Not sure if this is the actual intention of the authors but I think that class_logit is for "correct classification constraint" and class_cons is for "consistency constraint". Making a single fc to achieve both constraints could be challenging so the I think the authors use separate fcs for separate constraints.

developer0hye commented 3 years ago

@luciaL @dbjhbyun

I found the description related to your issue in the paper.

The consistency to teacher predictions may not necessarily be a good proxy for the classification task, especially early in the training. So far our model has strongly coupled these two tasks by using the same output for both. How would decoupling the tasks change the performance of the algorithm? To investigate, we changed the model to have two top layers and produce two outputs. We then trained one of the outputs for classification and the other for consistency. We also added a mean squared error cost between the output logits, and then varied the weight of this cost, allowing us to control the strength of the coupling. Looking at the results (reported using the EMA version of the classification output), we can see that the strongly coupled version performs well and the too loosely coupled versions do not. On the other hand, a moderate decoupling seems to have the benefit of making the consistency ramp-up redundant.