Fei-Long121 / DeepBDC

The Pytorch code of "Joint Distribution Matters: Deep Brownian Distance Covariance for Few-Shot Classification", CVPR 2022 (Oral).
Other
170 stars 24 forks source link

Self-distillation #7

Closed RongKaiWeskerMA closed 2 years ago

RongKaiWeskerMA commented 2 years ago

Hi, thanks for publishing the code. It is really an interesting work!

I have a question in terms of the self-distillation used in the pre-training stage. It looks like the teacher model is fixed in the beginning and the knowledge is distilled into a new student model over the iteration. So it is not really sequential (i.e. the previous student becomes the teacher in the next iteration)?

JiamingLv commented 2 years ago

Hello, thank you for your interest in our work. We adopt the scheme of Good-Embed [37] to achieve self-distillation, which distills knowledge from a trained model (teacher model) to a new model (student model), and the teacher model is fixed in the process. In addition, the network structure and training data of the teacher model and the student model are consistent.