NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review -- MoCo: Momentum Contrast for Unsupervised Visual Representation Learning. #141

Closed NorbertZheng closed 10 months ago

NorbertZheng commented 10 months ago

Sik-Ho Tang. Review — MoCo: Momentum Contrast for Unsupervised Visual Representation Learning.

NorbertZheng commented 10 months ago

Overview

Momentum Update for the Key Encoder, Outperforms Exemplar-CNN, Context Prediction, Jigsaw Puzzles, RotNet/Image Rotations, Colorization, DeepCluster, Instance Discrimination, CPCv1, CPCv2, CMC.

image Momentum Contrast (MoCo).

Momentum Contrast for Unsupervised Visual Representation Learning MoCo, by Facebook AI Research (FAIR) 2020 CVPR, Over 2400 Citations Self-Supervised Learning, Contrastive Learning, Image Classification, Object Detection, Segmentation

NorbertZheng commented 10 months ago

This can tackle the inconsistent dictionary problem caused by Instance Discrimination.

NorbertZheng commented 10 months ago

Contrastive Learning

image Left: Contrastive Learning Without Dictionary Lookup, Right Contrastive Learning With Dictionary Lookup.

(a) End-to-End

(b) Memory Bank

NorbertZheng commented 10 months ago

Contrastive learning since DrLIM, and its recent developments, can be thought of as

NorbertZheng commented 10 months ago

Momentum Contrast (MoCo)

image Momentum Contrast (MoCo).

The dictionary is dynamic in the sense that

The hypothesis is that good features can be learned by a large dictionary that covers a rich set of negative samples, while the encoder for the dictionary keys is kept as consistent as possible despite its evolution.

NorbertZheng commented 10 months ago

Dictionary as a Queue

The samples in the dictionary are progressively replaced. The current mini-batch is enqueued to the dictionary, and the oldest mini-batch in the queue is removed.

NorbertZheng commented 10 months ago

Momentum Update

Formally, denoting the parameters of $f{k}$ as $\theta{k}$ and those of $f{q}$ as $\theta{q}$, $\theta_{k}$ is updated by: image

Here, $m\in [0,1)$ is a momentum coefficient. Only the parameters $\theta_{q}$ are updated by back-propagation.

NorbertZheng commented 10 months ago

So it's better than Proximal Regularization used in Instance Discrimination???

NorbertZheng commented 10 months ago

Some Other Details

image MoCo Algorithm.

NorbertZheng commented 10 months ago

Shuffling BN

NorbertZheng commented 10 months ago

Ablation & ImageNet Results

Datasets

NorbertZheng commented 10 months ago

Ablation: Contrastive Loss Mechanisms

image Comparison of three contrastive loss mechanisms under the ImageNet linear classification protocol.

NorbertZheng commented 10 months ago

Ablation: Momentum

image Study of Momentum $m$.

It performs reasonably well when $m$ is in $0.99\sim 0.9999$, showing that a slowly progressing (i.e., relatively large momentum) key encoder is beneficial.

When $m$ is too small (e.g., $0.9$), the accuracy drops considerably.

NorbertZheng commented 10 months ago

SOTA Comparison

image Comparison under the linear classification protocol on ImageNet.

MoCo with R50 performs competitively and achieves 60.6% accuracy, better than all competitors of similar model sizes (~24M).

MoCo benefits from larger models and achieves 68.6% accuracy with R50×4 outperforms such as Exemplar-CNN, Relative Context Prediction, Jigsaw Puzzles, RotNet/Image Rotations, Colorization, DeepCluster, Instance Discrimination, LocalAgg, CPCv1, CPCv2, CMC.

NorbertZheng commented 10 months ago

Transferring Features Results

PASCAL VOC Object Detection

image Object detection fine-tuned on PASCAL VOC trainval07+12 In the brackets are the gaps to the ImageNet supervised pre-training counterpart.

image Comparison with previous methods on object detection fine-tuned on PASCAL VOC trainval2007.

MoCo pre-trained on any of IN-1M, IN-14M (full ImageNet), YFCC-100M [55], and IG-1B can outperform the supervised baseline.

NorbertZheng commented 10 months ago

COCO Object Detection and Segmentation

image Object detection and instance segmentation fine-tuned on COCO.

With the 2× schedule, MoCo is better than its ImageNet supervised counterpart in all metrics in both backbones.

NorbertZheng commented 10 months ago

More Downstream Tasks

image MoCo vs. ImageNet supervised pre-training, finetuned on various tasks.

In sum, MoCo can outperform its ImageNet supervised pre-training counterpart in 7 detection or segmentation tasks.

Remarkably, in all these tasks, MoCo pre-trained on IG-1B is consistently better than MoCo pre-trained on IN-1M. This shows that MoCo can perform well on this large-scale, relatively uncurated dataset. This represents a scenario towards real-world unsupervised learning.

NorbertZheng commented 10 months ago

Reference