NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tang | Reading: Cutout -- Improved Regularization of Convolutional Neural Networks (Image Classification). #116

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Reading: Cutout — Improved Regularization of Convolutional Neural Networks (Image Classification).

NorbertZheng commented 1 year ago

Overview

image Cutout applied to images from the CIFAR-10 dataset.

In this story, Improved Regularization of Convolutional Neural Networks with Cutout (Cutout), by University of Guelph, and Canadian Institute for Advanced Research and Vector Institute, is shortly presented.

This is a paper in 2017 arXiv with over 500 citations.

NorbertZheng commented 1 year ago

Data Augmentation!!!

NorbertZheng commented 1 year ago

Motivation

By generating new images which simulate occluded examples, we not only better prepare the model for encounters with occlusions in the real world, but the model also learns to take more of the image context into consideration when making decisions.

This method, cutout, can be interpreted as applying a spatial prior to dropout in input space, much in the same way that convolutional neural networks leverage information about spatial structure in order to improve performance over that of feed-forward networks.

NorbertZheng commented 1 year ago

Differences from Dropout

With considering applying noise in a similar fashion to dropout, there are two important distinctions:

NorbertZheng commented 1 year ago

A spatial prior to dropout!!!

NorbertZheng commented 1 year ago

Experimental Results

CIFAR10, CIFAR100, SHVN

image Cutout patch length with respect to validation accuracy with 95% confidence intervals (average of five runs).

The above figures depict the grid searches conducted on CIFAR-10 and CIFAR-100 respectively.

Based on these validation results we select a cutout size of 16×16 pixels to use on CIFAR-10 and a cutout size of 8×8 pixels for CIFAR-100 when training on the full datasets.

Similarly, for SHVN, to find the optimal size for the cutout region we conduct a grid search using 10% of the training set for validation and ultimately select a cutout size of 20×20 pixels.

image Test error rates (%) on CIFAR (C10, C100) and SVHN datasets

Cutout improves ResNet, WRN, and Shake-Shake.

Adding cutout to the current state-of-the-art Shake-Shake regularization models improves performance by 0.3 and 0.6 percentage points on CIFAR-10 and CIFAR-100 respectively, yielding new state-of- the-art results of 2.56% and 15.20% test error.

WRN-16–8 plus cutoff, an average reduction in test error of 0.3 percentage points is observed, resulting in a new state-of-the-art performance of 1.30% test error.

NorbertZheng commented 1 year ago

STL-10

image Test error rates on STL-10 dataset. “+” indicates standard data augmentation (mirror + crop). Results averaged over five runs on full training set.

For this reason, the unlabeled portion of the dataset is discarded and only the labeled training set is used.

A grid search is performed over the cutout size parameter using 10% of the training images as a validation set and select a square size of 24×24 pixels for the no data augmentation case and 32×32 pixels for training STL-10 with data augmentation.

Training the model using these values yields a reduction in test error of 2.7 percentage points in the no data augmentation case, and 1.5 percentage points when also using data augmentation.

NorbertZheng commented 1 year ago

Analysis of Cutout’s Effect on Activations

image Magnitude of feature activations, sorted by descending value, and averaged over all test samples.

The shallow layers of the network experience a general increase in activation strength, while in deeper layers, we see more activations in the tail end of the distribution.

The latter observation illustrates that cutout is indeed encouraging the network to take into account a wider variety of features when making predictions, rather than relying on the presence of a smaller number of features.

NorbertZheng commented 1 year ago

Reference