NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review -- Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction. #126

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction.

NorbertZheng commented 1 year ago

Overview

Split-Brain Auto for Self-Supervised Learning, Outperforms Jigsaw Puzzles, Context Prediction, ALI/BiGAN, L³-Net, Context Encoders, etc.

image Proposed Split-Brain Auto (Bottom) vs Traditional Autoencoder, e.g. Stacked Denoising Autoencoder (Top).

In this paper, Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction, (Split-Brain Auto), by Berkeley AI Research (BAIR) Laboratory, University of California, is reviewed. In this paper:

This is a paper in 2017 CVPR with over 400 citations.

NorbertZheng commented 1 year ago

Split-Brain Autoencoders (Split-Brain Auto)

image Split-Brain Autoencoders applied to various domains.

Cross-Channel Encoders

By performing this pretext task of predicting $X{2}$ from $X{1}$, we hope to achieve a representation $F(X_{1})$ which contains high-level abstractions or semantics.

Similar for $F{2}$ that $X{2}$ goes through network $F{2}$ to predict $X{1}$.

$l_{2}$-loss can be used to train the regression loss: image

NorbertZheng commented 1 year ago

Interesting!!! Cross-entropy loss is better than $l_{2}$ loss.

NorbertZheng commented 1 year ago

Split-Brain Autoencoders as Aggregated Cross-Channel Encoders

Multiple cross-channel encoders, $F{1}$, $F{2}$, on opposite prediction problems, with loss functions $L{1}$, $L{2}$, respectively: image

Example split-brain autoencoders in the image and RGB-D domains are shown in the above figure (a) and (b), respectively.

If F is a CNN of a desired fixed size, e.g., AlexNet, we can design the subnetworks F1, F2 by splitting each layer of the network F in half, along the channel dimension.

NorbertZheng commented 1 year ago

Alternative Aggregation Technique

One alternative, as a baseline: The same representation $F$ can be trained to perform both mappings simultaneously: image

Or even considering the full input tensor $X$. image

NorbertZheng commented 1 year ago

Step-by-step training is better!!! Reduce the difficulty of training task!!!

NorbertZheng commented 1 year ago

Experimental Results

ImageNet

image Task Generalization on ImageNet Classification.

Model:

Dataset:

To be brief, different autoencoder variants are tried.

Split-Brain Auto (cl, cl), cl means using classification loss, outperforms all variants and all self-supervised learning approaches such as Jigsaw Puzzles [30], Context Prediction [7], Ali [8]/BiGAN, Context Encoders [34] and Colorization [47].

NorbertZheng commented 1 year ago

Places

image Dataset & Task Generalization on Places Classification.

A different task (Places) than the pretraining tasks (ImageNet).

Similar results are obtained for Places Classification, it outperforms such as Jigsaw Puzzles [30], Context Prediction [7], L³-Net [45], Context Encoders [34] and Colorization [47].

NorbertZheng commented 1 year ago

PASCAL VOC

image Task and Dataset Generalization on PASCAL VOC.

To further test generalization, classification, detection and segmentation performance is evaluated on PASCAL VOC.

The proposed method, Split-Brain Auto (cl, cl), achieves state-of-the-art performance on almost all established self-supervision benchmarks.

NorbertZheng commented 1 year ago

There are still other results in the paper. If interested, please feel free to read the paper. Hope I can write a story about Jigsaw Puzzles in the coming future.

NorbertZheng commented 1 year ago

Reference