NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Brief Review -- Distilling Visual Priors from Self-Supervised Learning. #143

Closed NorbertZheng closed 10 months ago

NorbertZheng commented 10 months ago

Sik-Ho Tang. Brief Review — Distilling Visual Priors from Self-Supervised Learning.

NorbertZheng commented 10 months ago

Overview

Using MoCo v2 as Teacher, Knowledge Distillation for Student, in VIPriors Challenge.

image VIPriors Challenge (Image from 2020 ECCV Workshop VIPriors Challenge).

Distilling Visual Priors from Self-Supervised Learning MoCo v2+Distillation, by Tongji University, and Megvii Research Nanjing 2020 ECCV Workshop VIPriors Challenge

NorbertZheng commented 10 months ago

Proposed Framework

image Proposed Framework.

There are 2 phases. Phase-1 for teacher and Phase-2 for student.

NorbertZheng commented 10 months ago

Phase 1: Teacher

In a data-deficient dataset, the maximum size of the queue is limited, authors propose to

NorbertZheng commented 10 months ago

Phase-2: Self-Distillation on Labeled Dataset

The Distillation process can be seen as a regulation to

Following OFD, the Distillation loss is: image where distance metric $d_{p}$ is l2-distance in this paper.

Along with a cross-entropy loss for classification: image

The final loss function for the student model is: image $\lambda=10^{-4}$. 100 epochs are used for fine-tuning.

NorbertZheng commented 10 months ago

Results

Dataset

There are still 1,000 classes but 50 images for each class in each train/val/test split, resulting in a total of 150,000 images.

Performance

image Proposed Framework.

Finally, by combining phase-1 and phase-2 together, the proposed pipeline achieves 16.7 performance gain in top-1 accuracy over the supervised baseline.

image Linear Classifier.

The proposed margin loss is less sensitive to the number negatives and can be used in a data-deficient setting.

image Bag of Tricks.

Several other tricks and stronger backbone models are used for better performance: Larger Resolution, AutoAugment, ResNeXt-101, label-smooth (Inception-v3), 10-Crop, and 2-model ensemble.

NorbertZheng commented 10 months ago
NorbertZheng commented 10 months ago

Reference