Closed NorbertZheng closed 6 months ago
Using MoCo v2 as Teacher, Knowledge Distillation for Student, in VIPriors Challenge.
VIPriors Challenge (Image from 2020 ECCV Workshop VIPriors Challenge).
Distilling Visual Priors from Self-Supervised Learning MoCo v2+Distillation, by Tongji University, and Megvii Research Nanjing 2020 ECCV Workshop VIPriors Challenge
Proposed Framework.
There are 2 phases. Phase-1 for teacher and Phase-2 for student.
In a data-deficient dataset, the maximum size of the queue is limited, authors propose to
The Distillation process can be seen as a regulation to
Following OFD, the Distillation loss is: where distance metric $d_{p}$ is l2-distance in this paper.
Along with a cross-entropy loss for classification:
The final loss function for the student model is: $\lambda=10^{-4}$. 100 epochs are used for fine-tuning.
There are still 1,000 classes but 50 images for each class in each train/val/test split, resulting in a total of 150,000 images.
Proposed Framework.
Finally, by combining phase-1 and phase-2 together, the proposed pipeline achieves 16.7 performance gain in top-1 accuracy over the supervised baseline.
Linear Classifier.
The proposed margin loss is less sensitive to the number negatives and can be used in a data-deficient setting.
Bag of Tricks.
Several other tricks and stronger backbone models are used for better performance: Larger Resolution, AutoAugment, ResNeXt-101, label-smooth (Inception-v3), 10-Crop, and 2-model ensemble.
Sik-Ho Tang. Brief Review — Distilling Visual Priors from Self-Supervised Learning.