Closed YellowBe closed 3 years ago
Yes. Please refer to the paper or to the code for more details.
sorry~ i mean how do you do the distillation? what is the algorithm in the code? is it just like the algorithm in the paper "Distilling the Knowledge in a Neural Network"...
Yes. This is a vanilla application of distillation, with a fixed teacher and trainable student.
Hello, thanks for your paper! I just start to learn knowledge distillation, I want to know how to get the loss of global descriptor, does it works like get a vlad result in the student model and get another one in the teacher model(netvlad), and then calculate the MSE of them?