NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tsang | Review: Representation Learning with Contrastive Predictive Coding (CPC/CPCv1). #71

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tsang. Review: Representation Learning with Contrastive Predictive Coding (CPC/CPCv1).

NorbertZheng commented 1 year ago

Overview

Representation Learning Using InfoNCE Loss. In this story, Representation Learning with Contrastive Predictive Coding, (CPC/CPCv1), by DeepMind, is reviewed. In this paper:

This is a paper in 2018 arXiv with over 1800 citations. This paper makes use of NCE and Negative Sampling in NLP for representation learning/self-supervised learning.

NorbertZheng commented 1 year ago

Motivation and Intuition of Contrastive Predictive Coding (CPC)

image

NorbertZheng commented 1 year ago

Contrastive Predictive Coding (CPC): Overview

image

As argued in the previous section, we do not predict future observations $x{t+k}$ directly with a generative model $f{k}\propto (x{t+k}|c{t})$. Instead we model a density ratio which preserves the mutual information between $x{t+k}$ and $c{t}$:

image

image

A linear transformation $W{k}^{T}c{t}$ is used for the prediction with a different $W_{k}$ for every step $k$.

In the proposed model, either of $z{t}$ and $c{t}$ could be used as representation for downstream tasks.

NorbertZheng commented 1 year ago

InfoNCE Loss and Mutual Information Estimation

NorbertZheng commented 1 year ago

Experiments for Audio

Pretext Task

Downstream Task

image LibriSpeech phone and speaker classification results. For phone classification there are 41 possible classes and for speaker classification 251.

For phone classification, CPC obtains 64.6% accuracy. When a single hidden layer is used instead, the accuracy increases from 64.6% to 72.5%, which is closer to the accuracy of the fully supervised model.

Interestingly, CPCs capture both speaker identity and speech contents, as demonstrated by the good accuracies attained with a simple linear classifier, which also gets close to the oracle, fully supervised networks.

image LibriSpeech phone classification ablation experiments.

image t-SNE visualization of audio (speech) representations for a subset of 10 speakers (out of 251).

NorbertZheng commented 1 year ago

Experiments for Vision

Pretext Task

image Visualization of Contrastive Predictive Coding for images.

image Every row shows image patches that activate a certain neuron in the CPC architecture.

Downstream Task

image ImageNet top-1 unsupervised classification results.

image ImageNet top-5 unsupervised classification results.

NorbertZheng commented 1 year ago

Experiments for Natural Language

Pretext Task

Downstream Task

image Classification accuracy on five common NLP benchmarks ([40] is Doc2Vec).

The performance of our method is very similar to the Skip-Thought vector model [26], with the advantage that it does not require a powerful LSTM as word-level decoder, therefore much faster to train.

NorbertZheng commented 1 year ago

Experiments for Reinforcement Learning

image Reinforcement Learning results for 5 DeepMind Lab tasks Black: batched A2C baseline, Red: with auxiliary contrastive loss. 5 reinforcement learning in 3D environments of DeepMind Lab [51] are tested: rooms_watermaze, explore_goal_locations_small, seekavoid_arena_01, lasertag_three_opponents_small and rooms_keys_doors_puzzle. The standard batched A2C [52] agent is used as base model.

NorbertZheng commented 1 year ago

Later on, CPCv2 is published in 2020 ICLR, hope I have time to review it in the coming future.

NorbertZheng commented 1 year ago

Reference

[2018 arXiv] [CPC/CPCv1] Representation Learning with Contrastive Predictive Coding

NorbertZheng commented 1 year ago

Extended Readings

Self-Supervised Learning 2008–2010 [Stacked Denoising Autoencoders] 2014 [Exemplar-CNN] 2015 [Context Prediction] 2016 [Context Encoders] [Colorization] [Jigsaw Puzzles] 2017 [L³-Net] [Split-Brain Auto] [Mean Teacher] 2018 [RotNet/Image Rotations] [DeepCluster] [CPC/CPCv1]