Closed NorbertZheng closed 10 months ago
Using MoCo v2 as Teacher, Knowledge Distillation for Student, in VIPriors Challenge.
VIPriors Challenge (Image from 2020 ECCV Workshop VIPriors Challenge).
Distilling Visual Priors from Self-Supervised Learning MoCo v2+Distillation, by Tongji University, and Megvii Research Nanjing 2020 ECCV Workshop VIPriors Challenge
Proposed Framework.
There are 2 phases. Phase-1 for teacher and Phase-2 for student.
In a data-deficient dataset, the maximum size of the queue is limited, authors propose to
The Distillation process can be seen as a regulation to
Following OFD, the Distillation loss is: where distance metric $d_{p}$ is l2-distance in this paper.
Along with a cross-entropy loss for classification:
The final loss function for the student model is: $\lambda=10^{-4}$. 100 epochs are used for fine-tuning.
There are still 1,000 classes but 50 images for each class in each train/val/test split, resulting in a total of 150,000 images.
Proposed Framework.
Finally, by combining phase-1 and phase-2 together, the proposed pipeline achieves 16.7 performance gain in top-1 accuracy over the supervised baseline.
Linear Classifier.
The proposed margin loss is less sensitive to the number negatives and can be used in a data-deficient setting.
Bag of Tricks.
Several other tricks and stronger backbone models are used for better performance: Larger Resolution, AutoAugment, ResNeXt-101, label-smooth (Inception-v3), 10-Crop, and 2-model ensemble.
Sik-Ho Tang. Brief Review — Distilling Visual Priors from Self-Supervised Learning.