Closed NorbertZheng closed 1 year ago
Deep InfoMax (DIM) with Global & Local Objectives.
Learning Deep Representations by Mutual Information Estimation and Maximization, Deep InfoMax (DIM), by Microsoft Research 2019 ICLR, Over 1800 Citations. Self-Supervised Learning, Contrastive Learning, Image Classification.
Self-supervised representation learning is based on maximizing mutual information between features extracted from multiple views of a shared context.
While multiple views could be produced by
This is a paper from Prof. Bengio research group.
The encoder should be trained such that the mutual information is maximized:
Depending on the end-goal, this maximization can be done over the complete input, $X$, or some structured or “local” subset.
Deep InfoMax (DIM) with a global $MI(X; Y)$ objective.
One of the approaches follows Mutual Information Neural Estimation (MINE) (Belghazi et al., 2018), which uses a lower-bound to the MI based on the Donsker-Varadhan representation (DV, Donsker & Varadhan, 1983) of the KL-divergence:
where $T_{\omega}: X\times Y$ is a discriminator function modeled by a neural network with parameters $\omega$.
At a high level, $E$ is optimized by simultaneously estimating and maximizing $I(X;E_{\psi}(X))$:
where the subscript $G$ denotes “global”.
Jensen-Shannon MI estimator (following the formulation of Nowozin et al., 2016):
where $x$ is an input sample, $x$' is an input sampled from $\hat{\mathbb{P}}=\mathbb{P}$, and $sp(z)=log(1+e^{z})$ is the softplus function.
Similar to NCE or InfoNCE in CPC, this loss can also be used with DIM by maximizing:
Maximizing mutual information between local features and global features.
All three objectives — global and local MI maximization and prior matching — can be used together, as the complete objective for Deep InfoMax (DIM):
Classification accuracy (top 1) results on CIFAR10 and CIFAR100.
Classification accuracy (top 1) results on Tiny ImageNet and STL-10.
In general, DIM with the local objective, DIM(L), outperformed all models presented here by a significant margin on all datasets.
Among DV, JSD & InfoNCE, InfoNCE tends to perform best.
Comparisons of DIM with Contrastive Predictive Coding (CPC).
DIM(L) is competitive with CPC using InfoNCE.
This is an early paper for self-supervised learning. Many things are tried here.
[2019 ICLR] [Deep InfoMax (DIM)] Learning Deep Representations by Mutual Information Estimation and Maximization
Unsupervised/Self-Supervised Learning 1993 … 2021 [MoCo v3] [SimSiam] [DINO] [Exemplar-v1, Exemplar-v2] [Barlow Twins] [W-MSE] [SimSiam+AL] [BYOL+LP] 2022 [BEiT] [BEiT V2] [Masked Autoencoders (MAE)]
Sik-Ho Tsang. Brief Review — Learning Deep Representations by Mutual Information Estimation and Maximization.