Closed yyuxin closed 2 years ago
Thank you very much for your work!
I have noticed that before distillation, the teacher networks are loaded with a pre-trained model. Is the teacher network fixed during distillation, I didn't find where this part of the code (like detach or i.requires_grad = False)
Hi, please refer to https://github.com/dvlab-research/ReviewKD/blob/master/CIFAR-100/util/misc.py#L84
Thank you very much for your work!
I have noticed that before distillation, the teacher networks are loaded with a pre-trained model. Is the teacher network fixed during distillation, I didn't find where this part of the code (like detach or i.requires_grad = False)