Sik-Ho Tang | Review -- Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction.

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction.

NorbertZheng commented 1 year ago

Overview

Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.

Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top).

In this paper, Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, (Split-Brain Auto), by Berkeley AI Research (BAIR) Laboratory, University of California, is reviewed. In this paper:

A network is split into two sub-networks, each is trained to perform a difficult task:
- predicting one subset of the data channels from another.
By forcing the network to solve cross-channel prediction tasks, feature learning is achieved without using any labels.

This is a paper in 2017 CVPR with over 400 citations.

NorbertZheng commented 1 year ago

Split-Brain Autoencoders (Split-Brain Auto)

Split-Brain Autoencoders applied to various domains.

Cross-Channel Encoders

First, input data $X$ is divided into $X{1}$ and $X{2}$.
Then, $X{1}$ goes through network $F{1}$ to predict $X_{2}$:

By performing this pretext task of predicting $X{2}$ from $X{1}$, we hope to achieve a representation $F(X_{1})$ which contains high-level abstractions or semantics.

Similar for $F{2}$ that $X{2}$ goes through network $F{2}$ to predict $X{1}$.

Left: For Lab color space, $X{1}$ can be L, which is luminance information, and $X{2}$ can be ab, which are color information.
Right: For RGB-D image, $X{1}$ can be RGB values, and $X{2}$ can be D, which is depth information.

$l_{2}$-loss can be used to train the regression loss:

It is found that the cross-entropy loss is more effective than $l_{2}$ loss for the graphics task of automatic colorization than regression:

NorbertZheng commented 1 year ago

Interesting!!! Cross-entropy loss is better than $l_{2}$ loss.

NorbertZheng commented 1 year ago

Split-Brain Autoencoders as Aggregated Cross-Channel Encoders

Multiple cross-channel encoders, $F{1}$, $F{2}$, on opposite prediction problems, with loss functions $L{1}$, $L{2}$, respectively:

Example split-brain autoencoders in the image and RGB-D domains are shown in the above figure (a) and (b), respectively.

By concatenating the representations layer-wise, $F^{l} = \{F{1}^{l}, F{2}^{l}\}$, a representation $F$ is achieved which is pre-trained on full input tensor $X$.

If F is a CNN of a desired fixed size, e.g., AlexNet, we can design the subnetworks F1, F2 by splitting each layer of the network F in half, along the channel dimension.

The network is modified to be fully convolutionally and trained for a pixel-prediction task.

NorbertZheng commented 1 year ago

Alternative Aggregation Technique

One alternative, as a baseline: The same representation $F$ can be trained to perform both mappings simultaneously:

Or even considering the full input tensor $X$.

However, it is found that the proposed Split-Brain Auto (Section 1.2) outperforms the above two alternatives (Section 1.3).

NorbertZheng commented 1 year ago

Step-by-step training is better!!! Reduce the difficulty of training task!!!

NorbertZheng commented 1 year ago

Experimental Results

ImageNet

Task Generalization on ImageNet Classification.

Model:

The proposed split-brain autoencoder architecture learns the unsupervised representations on large-scale image data from ImageNet.
Lab color space is used to train the split-brain autoencoder.
All weights are frozen and feature maps spatially resized to be ~9000 dimensions.
All methods use AlexNet variants.

Dataset:

The 1.3M (i.e. 1'300'000) ImageNet dataset without labels is used for training, except for ImageNet-labels.

To be brief, different autoencoder variants are tried.

Split-Brain Auto (cl, cl), cl means using classification loss, outperforms all variants and all self-supervised learning approaches such as Jigsaw Puzzles [30], Context Prediction [7], Ali [8]/BiGAN, Context Encoders [34] and Colorization [47].

NorbertZheng commented 1 year ago

Places

Dataset & Task Generalization on Places Classification.

A different task (Places) than the pretraining tasks (ImageNet).

Similar results are obtained for Places Classification, it outperforms such as Jigsaw Puzzles [30], Context Prediction [7], L³-Net [45], Context Encoders [34] and Colorization [47].

NorbertZheng commented 1 year ago

PASCAL VOC

Task and Dataset Generalization on PASCAL VOC.

To further test generalization, classification, detection and segmentation performance is evaluated on PASCAL VOC.

The proposed method, Split-Brain Auto (cl, cl), achieves state-of-the-art performance on almost all established self-supervision benchmarks.

NorbertZheng commented 1 year ago

There are still other results in the paper. If interested, please feel free to read the paper. Hope I can write a story about Jigsaw Puzzles in the coming future.

NorbertZheng commented 1 year ago

Reference

[2017 CVPR] [Split-Brain Auto] Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction.

NorbertZheng / read-papers