NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tang | Brief Review -- Distilling Visual Priors from Self-Supervised Learning. #143

Closed NorbertZheng closed 6 months ago

NorbertZheng commented 6 months ago

Sik-Ho Tang. Brief Review — Distilling Visual Priors from Self-Supervised Learning.

NorbertZheng commented 6 months ago

Overview

Using MoCo v2 as Teacher, Knowledge Distillation for Student, in VIPriors Challenge.

image VIPriors Challenge (Image from 2020 ECCV Workshop VIPriors Challenge).

Distilling Visual Priors from Self-Supervised Learning MoCo v2+Distillation, by Tongji University, and Megvii Research Nanjing 2020 ECCV Workshop VIPriors Challenge

NorbertZheng commented 6 months ago

Proposed Framework

image Proposed Framework.

There are 2 phases. Phase-1 for teacher and Phase-2 for student.

NorbertZheng commented 6 months ago

Phase 1: Teacher

In a data-deficient dataset, the maximum size of the queue is limited, authors propose to

NorbertZheng commented 6 months ago

Phase-2: Self-Distillation on Labeled Dataset

The Distillation process can be seen as a regulation to

Following OFD, the Distillation loss is: image where distance metric $d_{p}$ is l2-distance in this paper.

Along with a cross-entropy loss for classification: image

The final loss function for the student model is: image $\lambda=10^{-4}$. 100 epochs are used for fine-tuning.

NorbertZheng commented 6 months ago

Results

Dataset

There are still 1,000 classes but 50 images for each class in each train/val/test split, resulting in a total of 150,000 images.

Performance

image Proposed Framework.

Finally, by combining phase-1 and phase-2 together, the proposed pipeline achieves 16.7 performance gain in top-1 accuracy over the supervised baseline.

image Linear Classifier.

The proposed margin loss is less sensitive to the number negatives and can be used in a data-deficient setting.

image Bag of Tricks.

Several other tricks and stronger backbone models are used for better performance: Larger Resolution, AutoAugment, ResNeXt-101, label-smooth (Inception-v3), 10-Crop, and 2-model ensemble.

NorbertZheng commented 6 months ago
NorbertZheng commented 6 months ago

Reference