CuriousAI / mean-teacher

A state-of-the-art semi-supervised method for image recognition
https://arxiv.org/abs/1703.01780
Other
1.56k stars 331 forks source link

Gradients because of ema being dependent upon student variables #29

Open Nitinsiwach opened 5 years ago

Nitinsiwach commented 5 years ago

Great paper!

Tensorflow documentation says the EMA variables are created with (trainable=False) and added to the GraphKeys.ALL_VARIABLES collection. Now as they are not trainable they wont have the gradient applied on them, i understand that. But, as they depend upon the current trainable variables of the graph, and hence so do the predictions of the teacher network; an additional gradient will flow to the trainable variables because of ema being dependent upon them. Is this correct understnading of implementation?

tarvaina commented 5 years ago

No. The teacher updates happen between training steps and gradients don’t flow across training steps.