NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tsang | Brief Review -- Learning Deep Representations by Mutual Information Estimation and Maximization. #70

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tsang. Brief Review — Learning Deep Representations by Mutual Information Estimation and Maximization.

NorbertZheng commented 1 year ago

Overview

Deep InfoMax (DIM) with Global & Local Objectives.

NorbertZheng commented 1 year ago

Deep InfoMax (DIM)

image

NorbertZheng commented 1 year ago

Global DIM: DIM(G)

image Deep InfoMax (DIM) with a global $MI(X; Y)$ objective.

NorbertZheng commented 1 year ago

Mutual Information Network Estimation

Donsker-Varadhan (DV)

One of the approaches follows Mutual Information Neural Estimation (MINE) (Belghazi et al., 2018), which uses a lower-bound to the MI based on the Donsker-Varadhan representation (DV, Donsker & Varadhan, 1983) of the KL-divergence:

image

where $T_{\omega}: X\times Y$ is a discriminator function modeled by a neural network with parameters $\omega$.

At a high level, $E$ is optimized by simultaneously estimating and maximizing $I(X;E_{\psi}(X))$:

image

where the subscript $G$ denotes “global”.

Jensen-Shannon Divergence (JSD)

Jensen-Shannon MI estimator (following the formulation of Nowozin et al., 2016):

image

where $x$ is an input sample, $x$' is an input sampled from $\hat{\mathbb{P}}=\mathbb{P}$, and $sp(z)=log(1+e^{z})$ is the softplus function.

Noise-Contrastive Estimation (NCE)

Similar to NCE or InfoNCE in CPC, this loss can also be used with DIM by maximizing:

image

NorbertZheng commented 1 year ago

Local DIM: DIM(L)

image Maximizing mutual information between local features and global features.

image

NorbertZheng commented 1 year ago

Prior Matching

image

NorbertZheng commented 1 year ago

Complete Objective

All three objectives — global and local MI maximization and prior matching — can be used together, as the complete objective for Deep InfoMax (DIM):

image

NorbertZheng commented 1 year ago

Results

image Classification accuracy (top 1) results on CIFAR10 and CIFAR100.

image Classification accuracy (top 1) results on Tiny ImageNet and STL-10.

In general, DIM with the local objective, DIM(L), outperformed all models presented here by a significant margin on all datasets.

Among DV, JSD & InfoNCE, InfoNCE tends to perform best.

image Comparisons of DIM with Contrastive Predictive Coding (CPC).

DIM(L) is competitive with CPC using InfoNCE.

NorbertZheng commented 1 year ago

This is an early paper for self-supervised learning. Many things are tried here.

NorbertZheng commented 1 year ago

Reference

[2019 ICLR] [Deep InfoMax (DIM)] Learning Deep Representations by Mutual Information Estimation and Maximization

NorbertZheng commented 1 year ago

Extended Reading

Unsupervised/Self-Supervised Learning 1993 … 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [Barlow Twins] [W-MSE] [SimSiam+AL] [BYOL+LP] 2022 [BEiT] [BEiT V2] [Masked Autoencoders (MAE)]