Closed NorbertZheng closed 1 year ago
Cutout applied to images from the CIFAR-10 dataset.
In this story, Improved Regularization of Convolutional Neural Networks with Cutout (Cutout), by University of Guelph, and Canadian Institute for Advanced Research and Vector Institute, is shortly presented.
This is a paper in 2017 arXiv with over 500 citations.
Data Augmentation!!!
By generating new images which simulate occluded examples, we not only better prepare the model for encounters with occlusions in the real world, but the model also learns to take more of the image context into consideration when making decisions.
This method, cutout, can be interpreted as applying a spatial prior to dropout in input space, much in the same way that convolutional neural networks leverage information about spatial structure in order to improve performance over that of feed-forward networks.
With considering applying noise in a similar fashion to dropout, there are two important distinctions:
A spatial prior to dropout!!!
Cutout patch length with respect to validation accuracy with 95% confidence intervals (average of five runs).
The above figures depict the grid searches conducted on CIFAR-10 and CIFAR-100 respectively.
Based on these validation results we select a cutout size of 16×16 pixels to use on CIFAR-10 and a cutout size of 8×8 pixels for CIFAR-100 when training on the full datasets.
Similarly, for SHVN, to find the optimal size for the cutout region we conduct a grid search using 10% of the training set for validation and ultimately select a cutout size of 20×20 pixels.
Test error rates (%) on CIFAR (C10, C100) and SVHN datasets
Cutout improves ResNet, WRN, and Shake-Shake.
Adding cutout to the current state-of-the-art Shake-Shake regularization models improves performance by 0.3 and 0.6 percentage points on CIFAR-10 and CIFAR-100 respectively, yielding new state-of- the-art results of 2.56% and 15.20% test error.
WRN-16–8 plus cutoff, an average reduction in test error of 0.3 percentage points is observed, resulting in a new state-of-the-art performance of 1.30% test error.
Test error rates on STL-10 dataset. “+” indicates standard data augmentation (mirror + crop). Results averaged over five runs on full training set.
For this reason, the unlabeled portion of the dataset is discarded and only the labeled training set is used.
A grid search is performed over the cutout size parameter using 10% of the training images as a validation set and select a square size of 24×24 pixels for the no data augmentation case and 32×32 pixels for training STL-10 with data augmentation.
Training the model using these values yields a reduction in test error of 2.7 percentage points in the no data augmentation case, and 1.5 percentage points when also using data augmentation.
Magnitude of feature activations, sorted by descending value, and averaged over all test samples.
The shallow layers of the network experience a general increase in activation strength, while in deeper layers, we see more activations in the tail end of the distribution.
The latter observation illustrates that cutout is indeed encouraging the network to take into account a wider variety of features when making predictions, rather than relying on the presence of a smaller number of features.
Sik-Ho Tang. Reading: Cutout — Improved Regularization of Convolutional Neural Networks (Image Classification).