Closed NorbertZheng closed 1 year ago
Exemplar-CNN: Trained on Unlabeled Data Using Surrogate Class by Data Transformation. Surrogate classes are generated by data transformation using unlabeled data.
In this story, Discriminative Unsupervised Feature Learning with Convolutional Neural Networks, (Exemplar-CNN), by University of Freiburg, is reviewed. In this paper:
This is a paper in 2014 NIPS with over 600 citations.
Random transformation is applied to patches. All transformed patches from the same original “seed” image, are having the same surrogate class as the original “seed” images.
If there are 8000 “seed” images, then there are 8000 surrogate classes.
Data Augmentation in SimCLR???
$N\in [50,32000]$ patches of size $32\times 32$ pixels are randomly sampled from different images at varying positions and scales forming the initial training set $X=\{x{1}, …, x{N}\}$.
We are interested in patches containing objects or parts of objects, hence we sample only from regions containing considerable gradients (a kind of prior!!!).
A family of transformations $\{T_{\alpha} | \alpha\in A\}$ is defined parameterized by vectors $\in A$, where $A$ is the set of all possible parameter vectors. Each transformation $T$ is a composition of elementary transformations from the following list:
For each initial patch $x{i}\in X, K\in [1,300]$ random parameter vectors $\{\alpha{i}^{1},...,\alpha_{i}^{K}\}$ are sampled:
And the corresponding transformations $\{T{\alpha{i}^{1}},...,T{\alpha{i}^{K}}\}$ to the patch $x_{i}$. (i.e., to be brief, applying random transformation to each patch)
This yields the set of its transformed versions $S{x{i}}=T{i}x{i}=\{Tx{i}|T\in T{i}\}$.
Afterwards, the mean of each pixel over the whole resulting dataset are subtracted, and no any other preprocessing.
Exemplary patches sampled from the STL unlabeled dataset which are later augmented by various transformations to obtain surrogate data for the CNN training.
Several random transformations applied to one of the patches extracted from the STL unlabeled dataset. The original (’seed’) patch is in the top left corner.
Just like SimCLR, the augmentation techniques are pre-defined, such augmentation techniques are assumed to be irrelevant to the true semantic meanings.
This supports multi-view!!!
With surrogated class generated, CNN can be trained.
Formally, we minimize the following loss function: each of these sets to be a class by assigning label $i$ to the class $S{x{i}}$.
where $l(i,T{x{i}})$ is the loss on the transformed sample $T{x{i}}$ with (surrogate) true label $i$.
Intuitively, the classification problem described above serves to ensure that
After training the CNN using unlabeled dataset, the CNN features are pooled are used to train a linear SVM for the target dataset, which will be mentioned in more details as below.
Two networks are used: One is small and one is big.
All convolution is $5\times 5$ filters. $2\times 2$ max pooling is used after the first and second convolutions. Dropout is applied to the fully connected layer.
Really swallow convolution network!!! Just like the architecture we designed in naive_cnn.
Classification accuracies on several datasets
The features extracted from the larger network match or outperform the best prior result on all datasets.
This is despite the fact that
Influence of the number of surrogate training classes.
The number $N$ of surrogate classes is varied between 50 and 32000.
The classification accuracy increases with the number of surrogate classes until it reaches an optimum at about 8000 surrogate classes after which it did not change or even decreased.
Classification performance on STL for different numbers of samples per class.
The performance improves with more samples per surrogate class and saturates at around 100 samples.
Influence of removing groups of transformations during generation of the surrogate training data.
The value “0” corresponds to applying random compositions of all elementary transformations: scaling, rotation, translation, color variation, and contrast variation.
Different columns of the plot show the difference in classification accuracy as we discarded some types of elementary transformations.
Sik-Ho Tang. Review — Exemplar-CNN: Discriminative Unsupervised Feature Learning with Convolutional Neural Networks (Self-Supervised Learning).