NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review -- Exemplar-CNN: Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (Self-Supervised Learning). #119

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Exemplar-CNN: Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (Self-Supervised Learning).

NorbertZheng commented 1 year ago

Overview

Exemplar-CNN: Trained on Unlabeled Data Using Surrogate Class by Data Transformation. image Surrogate classes are generated by data transformation using unlabeled data.

In this story, Discriminative Unsupervised Feature Learning with Convolutional Neural Networks, (Exemplar-CNN), by University of Freiburg, is reviewed. In this paper:

This is a paper in 2014 NIPS with over 600 citations.

NorbertZheng commented 1 year ago

Creating Surrogate Training Data & Learning Algorithm

Random transformation is applied to patches. All transformed patches from the same original “seed” image, are having the same surrogate class as the original “seed” images.

If there are 8000 “seed” images, then there are 8000 surrogate classes.

NorbertZheng commented 1 year ago

Data Augmentation in SimCLR???

NorbertZheng commented 1 year ago

Creating Surrogate Training Data

$N\in [50,32000]$ patches of size $32\times 32$ pixels are randomly sampled from different images at varying positions and scales forming the initial training set $X=\{x{1}, …, x{N}\}$.

We are interested in patches containing objects or parts of objects, hence we sample only from regions containing considerable gradients (a kind of prior!!!).

NorbertZheng commented 1 year ago

A family of transformations $\{T_{\alpha} | \alpha\in A\}$ is defined parameterized by vectors $\in A$, where $A$ is the set of all possible parameter vectors. Each transformation $T$ is a composition of elementary transformations from the following list:

NorbertZheng commented 1 year ago

For each initial patch $x{i}\in X, K\in [1,300]$ random parameter vectors $\{\alpha{i}^{1},...,\alpha_{i}^{K}\}$ are sampled: image

And the corresponding transformations $\{T{\alpha{i}^{1}},...,T{\alpha{i}^{K}}\}$ to the patch $x_{i}$. (i.e., to be brief, applying random transformation to each patch) image

This yields the set of its transformed versions $S{x{i}}=T{i}x{i}=\{Tx{i}|T\in T{i}\}$. image

Afterwards, the mean of each pixel over the whole resulting dataset are subtracted, and no any other preprocessing.

image Exemplary patches sampled from the STL unlabeled dataset which are later augmented by various transformations to obtain surrogate data for the CNN training.

image Several random transformations applied to one of the patches extracted from the STL unlabeled dataset. The original (’seed’) patch is in the top left corner.

NorbertZheng commented 1 year ago

Just like SimCLR, the augmentation techniques are pre-defined, such augmentation techniques are assumed to be irrelevant to the true semantic meanings.

This supports multi-view!!!

NorbertZheng commented 1 year ago

Learning Algorithm

With surrogated class generated, CNN can be trained.

Intuitively, the classification problem described above serves to ensure that

After training the CNN using unlabeled dataset, the CNN features are pooled are used to train a linear SVM for the target dataset, which will be mentioned in more details as below.

NorbertZheng commented 1 year ago

CNN Architectures & Experimental Setup

Unlabeled Dataset for Surrogate Class

Two CNNs

Two networks are used: One is small and one is big.

All convolution is $5\times 5$ filters. $2\times 2$ max pooling is used after the first and second convolutions. Dropout is applied to the fully connected layer.

NorbertZheng commented 1 year ago

Really swallow convolution network!!! Just like the architecture we designed in naive_cnn.

NorbertZheng commented 1 year ago

Pooled-Features for Linear SVM

NorbertZheng commented 1 year ago

Experimental Results

SOTA Comparison

image Classification accuracies on several datasets

The features extracted from the larger network match or outperform the best prior result on all datasets.

This is despite the fact that

NorbertZheng commented 1 year ago

Number of Surrogate Classes

image Influence of the number of surrogate training classes.

The number $N$ of surrogate classes is varied between 50 and 32000.

The classification accuracy increases with the number of surrogate classes until it reaches an optimum at about 8000 surrogate classes after which it did not change or even decreased.

NorbertZheng commented 1 year ago

Number of Samples per Surrogate Class

image Classification performance on STL for different numbers of samples per class.

The performance improves with more samples per surrogate class and saturates at around 100 samples.

NorbertZheng commented 1 year ago

Types of Transformations

image Influence of removing groups of transformations during generation of the surrogate training data.

The value “0” corresponds to applying random compositions of all elementary transformations: scaling, rotation, translation, color variation, and contrast variation.

Different columns of the plot show the difference in classification accuracy as we discarded some types of elementary transformations.

NorbertZheng commented 1 year ago

Reference