Closed NorbertZheng closed 1 year ago
Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.
Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top).
In this paper, Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, (Split-Brain Auto), by Berkeley AI Research (BAIR) Laboratory, University of California, is reviewed. In this paper:
This is a paper in 2017 CVPR with over 400 citations.
Split-Brain Autoencoders applied to various domains.
By performing this pretext task of predicting $X{2}$ from $X{1}$, we hope to achieve a representation $F(X_{1})$ which contains high-level abstractions or semantics.
Similar for $F{2}$ that $X{2}$ goes through network $F{2}$ to predict $X{1}$.
$l_{2}$-loss can be used to train the regression loss:
Interesting!!! Cross-entropy loss is better than $l_{2}$ loss.
Multiple cross-channel encoders, $F{1}$, $F{2}$, on opposite prediction problems, with loss functions $L{1}$, $L{2}$, respectively:
Example split-brain autoencoders in the image and RGB-D domains are shown in the above figure (a) and (b), respectively.
If F is a CNN of a desired fixed size, e.g., AlexNet, we can design the subnetworks F1, F2 by splitting each layer of the network F in half, along the channel dimension.
One alternative, as a baseline: The same representation $F$ can be trained to perform both mappings simultaneously:
Or even considering the full input tensor $X$.
Step-by-step training is better!!! Reduce the difficulty of training task!!!
Task Generalization on ImageNet Classification.
Model:
Dataset:
To be brief, different autoencoder variants are tried.
Split-Brain Auto (cl, cl), cl means using classification loss, outperforms all variants and all self-supervised learning approaches such as Jigsaw Puzzles [30], Context Prediction [7], Ali [8]/BiGAN, Context Encoders [34] and Colorization [47].
Dataset & Task Generalization on Places Classification.
A different task (Places) than the pretraining tasks (ImageNet).
Similar results are obtained for Places Classification, it outperforms such as Jigsaw Puzzles [30], Context Prediction [7], L³-Net [45], Context Encoders [34] and Colorization [47].
Task and Dataset Generalization on PASCAL VOC.
To further test generalization, classification, detection and segmentation performance is evaluated on PASCAL VOC.
The proposed method, Split-Brain Auto (cl, cl), achieves state-of-the-art performance on almost all established self-supervision benchmarks.
There are still other results in the paper. If interested, please feel free to read the paper. Hope I can write a story about Jigsaw Puzzles in the coming future.
Sik-Ho Tang. Review — Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction.