Gradients because of ema being dependent upon student variables

Great paper!

Tensorflow documentation says the EMA variables are created with (trainable=False) and added to the GraphKeys.ALL_VARIABLES collection. Now as they are not trainable they wont have the gradient applied on them, i understand that. But, as they depend upon the current trainable variables of the graph, and hence so do the predictions of the teacher network; an additional gradient will flow to the trainable variables because of ema being dependent upon them. Is this correct understnading of implementation?

CuriousAI / mean-teacher

Gradients because of ema being dependent upon student variables #29