NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review: Shake-Shake Regularization (Image Classification). #115

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review: Shake-Shake Regularization (Image Classification).

NorbertZheng commented 1 year ago

Overview

In this story, Shake-Shake Regularization (Shake-Shake), by Xavier Gastaldi from London Business School, is briefly reviewed. The motivation of this paper is that data augmentation is applied at the input image,

It is found in prior art that adding noise to the gradient during training helps training and generalization of complicated neural networks. And

This is a paper in 2017 ICLR Workshop with over 10 citations. And the long version in 2017 arXiv has got over 100 citations.

NorbertZheng commented 1 year ago

Shake-Shake Regularization

image Left: Forward training pass. Center: Backward training pass. Right: At test time.

With Shake-Shake Regularization, $\alpha$ is added: image

$\alpha$ is set to $0.5$ during test time, just like Dropout.

NorbertZheng commented 1 year ago

Experimental Results

CIFAR-10

26 2×32d ResNet (i.e. the network has a depth of 26, 2 residual branches and the first residual block has a width of 32) is used.

image Error Rates of CIFAR-10.

And Shake-Shake-Image (S-S-I) obtains the best result for 26 2×64d ResNet and 26 2×96d ResNet.

NorbertZheng commented 1 year ago

CIFAR-100

image Error Rates of CIFAR-100.

Using Shake at forward pass again improves the performance.

Particularly, Shake-Even-Image (S-E-I) is the best.

NorbertZheng commented 1 year ago

Comparison with State-of-the-art Approaches

image Test error (%) and Model Size on CIFAR.

On CIFAR-10, S-S-I outperforms WRN, ResNeXt and DenseNet.

On CIFAR-100, S-E-I outperforms WRN, ResNeXt and DenseNet as well.

NorbertZheng commented 1 year ago

Further Evaluation

Correlation Between Residual Branches

image Correlation results on E-E-B and S-S-I models.

image Layer-wise correlation between the first 3 layers of each residual block.

The summation at the end of the residual blocks forces an alignment of the layers on the left and right residual branches.

The correlation is reduced by the regularization.

NorbertZheng commented 1 year ago

Regularization Strength

image Update Rules for $\beta$.

image Left: Training curves (dark) and test curves (light) of models M1 to M5. Right: Illustration of the different methods in the above Table.

The further away $\beta$ is from $\alpha$, the stronger the regularization effect.

NorbertZheng commented 1 year ago

Removing Skip Connection / Batch Normalization

image Error Rates of CIFAR-10.

NorbertZheng commented 1 year ago

With the simple yet novel idea and of course the positive results, it is published in 2017 ICLR Workshop which is very encouraging.

NorbertZheng commented 1 year ago

References