Sik-Ho Tang | Review -- Striving for Simplicity: The All Convolutional Net.

NorbertZheng / read-papers

My paper reading notes.

MIT License

8 stars 0 forks source link

Sik-Ho Tang | Review -- Striving for Simplicity: The All Convolutional Net. #97

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review — Striving for Simplicity: The All Convolutional Net.

NorbertZheng commented 1 year ago

Overview

Striving for Simplicity: The All Convolutional Net. All-CNN, by University of Freiburg. 2015 ICLR Workshop, Over 3800 Citations.

Convolutional Neural Network, CNN, Image Classification.

CNNs are commonly composed of alternative convolution and max-pooling layers followed by a small number of fully connected layers.
In this paper, max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy.
Fully connected layers are replaced by global average pooling layer.

NorbertZheng commented 1 year ago

Proposed All-CNN

Two Means to Replace Pooling

Convolution With Stride Larger Than 1 to Reduce Spatial Size More Aggressively (Figure from here).

There are two means suggested to replace pooling for spatial dimensionality reduction (i.e. along width & height axis, instead of filter axis):

To increase the stride of the convolutional layer that preceded it. However, it significantly reduces the overlap of the convolutional layer, and results in less accurate recognition.
Or to replace the pooling layer by a normal convolution with stride larger than one.

The second option results in an increase of overall network parameters, yet without loss in accuracy.

NorbertZheng commented 1 year ago

Global Average Pooling to Replace Fully Connected Layer

Global Average Pooling in NIN to Replace Fully Connected Layer (Image from NIN).

This is firstly suggested in NIN.

Using global average pooling to replace fully connected layer helps to reduce large amount of parameters.

NorbertZheng commented 1 year ago

Three Base Models

The three base networks used for classification on CIFAR-10 and CIFAR-100.

Overall, three base models are suggested which consist only of convolutional layers with rectified linear non-linearities and an averaging + softmax layer to produce predictions over the whole image.

NorbertZheng commented 1 year ago

Three Derived Models from Base Models

Model description of the three networks derived from base model C used for evaluating the importance of pooling in case of classification on CIFAR-10 and CIFAR-100.

Further enhanced models are derived from base models. The derived models for base models A and B are built analogously but not shown in the above table.

5×5 convolutions are replaced by 2 consecutive 3×3 convolutions.

NorbertZheng commented 1 year ago

Detailed Architecture

Architecture of the Large All-CNN network for CIFAR-10.

The above shows the detailed architecture for CIFAR-10.

Architecture of the ImageNet network.

The above shows the detailed architecture for ImageNet.

NorbertZheng commented 1 year ago

Experimental Results

Ablation Study

All-CNN-C has the best performance.

SOTA Comparison

Test error on CIFAR-10 and CIFAR-100 for the All-CNN compared to the state of the art from the literature.

On CIFAR-10, All-CNN is the All-CNN-C. It outperforms Maxout, and NIN, etc. On CIFAR-100, All-CNN-C obtains competitive performance.

ImageNet

An upscaled version of the All-CNN-B network is trained, which has 12 convolutional layers.

This network achieves a Top-1 validation error of 41.2% on ILSVRC-2012, when only evaluating on the center 224×224 patch, — which is comparable to the 40.7% Top-1 error reported by AlexNet.

NorbertZheng commented 1 year ago

(There are also sections to visualize the feature map response using deconvolutions which are something similar to ZFNet, please feel free to read the paper.)

NorbertZheng commented 1 year ago

Reference

[2015 ICLR Workshop] [All-CNN] Striving for Simplicity: The All Convolutional Net.