Britefury / self-ensemble-visual-domain-adapt

Code repository for the small image experiments our paper 'Self-ensembling for Domain Adaptation'
MIT License
193 stars 34 forks source link

Hope to get your answer. Thank you! #12

Closed hydxqing closed 4 years ago

hydxqing commented 4 years ago

Hello, I read the code carefully and tried to run it, but I still have some doubts. I hope to get your help.

  1. Before the exponential moving average of the first execution weight, are the weights of student network and teacher network random?
  2. My understanding is that the labeled source domain data is used to adjust the weight of the student network, so the student network is more reliable than the teacher network. So when the weight is moved exponentially, shouldn't most of the weight in the teacher network come from the weight of the student network? When updating the weight of teacher network, why not 0.99 times the weight of student network? 3.In the EMAWeightOptimizer Class, “for tgt_p, src_p in zip(self.target_params, self.source_params): tgt_p[:] = src_p[:]”. What does this code mean?
Britefury commented 4 years ago
  1. Although both networks are initialised randomly, the EMAWeightOptimizer constructor starts by first copying the student weights over those of the teacher (lines 29-30), so the teacher and student will be identical before training starts.
  2. Not really. The teacher network should follow the student slowly, hence teacher = teacher 0.99 + student 0.01. The 'trajectory' of its weights is therefore more smooth. It can therefore be used in effect to produce reliable pseudo-labels.
  3. It erases the weights of the teacher by copying the weights of the student over them.
hydxqing commented 4 years ago

If the two weights are the same before training, How to train the consistency loss? Isn't the goal of training consistency loss to make the weight of the two net equal? When you execute teacher = teacher 0.99 + student 0.01, the teacher weight cannot change. Isn't it?

Britefury commented 4 years ago

Please read the 'temporal ensembling' (https://arxiv.org/abs/1610.02242) and 'mean teachers are better role models' (https://arxiv.org/abs/1703.01780) papers; they should give you a better idea of how consistency regularization and the mean teacher model work.