NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review: ResNet -- Winner of ILSVRC 2015 (Image Classification, Localization, Detection). #101

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review: ResNet — Winner of ILSVRC 2015 (Image Classification, Localization, Detection).

NorbertZheng commented 1 year ago

Overview

In this story, ResNet [1] is reviewed. ResNet can have a very deep network of up to 152 layers by

ResNet introduces skip connection (or shortcut connection) to fit the input from the previous layer to the next layer without any modification of the input. Skip connection enables to have deeper network and finally ResNet becomes the Winner of ILSVRC 2015 in image classification, detection, and localization, as well as Winner of MS COCO 2015 detection, and segmentation. This is a 2016 CVPR paper with more than 19000 citations.

image ILSVRC 2015 Image Classification Ranking.

NorbertZheng commented 1 year ago

Dataset

ImageNet, is a dataset of over 15 millions labeled high-resolution images with around 22,000 categories. ILSVRC uses a subset of ImageNet of around 1000 images in each of 1000 categories. In all, there are roughly 1.2 million training images, 50,000 validation images and 100,000 testing images.

NorbertZheng commented 1 year ago

Problems of Plain Network

For conventional deep learning networks, they usually have conv layers then fully connected (FC) layers for classification task like AlexNet, ZFNet and VGGNet, without any skip / shortcut connection, we call them plain networks here.

Vanishing / Exploding Gradients

During backpropagation, when partial derivative of the error function with respect to the current weight in each iteration of training, this has the effect of

When the network is deep, and multiplying $n$ of these small numbers will become zero (vanished).

When the network is deep, and multiplying $n$ of these large numbers will become too large (exploded).

We expect

However, below shows an example, 20-layer plain network got lower training error and test error than 56-layer plain network, a degradation problem occurs due to vanishing gradients.

image Plain Networks for CIFAR-10 Dataset.

NorbertZheng commented 1 year ago

Skip / Shortcut Connection in Residual Network (ResNet)

To solve the problem of vanishing/exploding gradients, a skip / shortcut connection is added to add the input $x$ to the output after few weight layers as below: image A Building Block of Residual Network.

Hence, the output $H(x)= F(x) + x$. The weight layers actually is to learn a kind of residual mapping: $F(x)=H(x)-x$.

NorbertZheng commented 1 year ago

ResNet Architecture

image 34-layer ResNet with Skip / Shortcut Connection (Top), 34-layer Plain Network (Middle), 19-layer VGG-19 (Bottom).

The above figure shows the ResNet architecture.

For ResNet, there are 3 types of skip / shortcut connections when the input dimensions are smaller than the output dimensions.

NorbertZheng commented 1 year ago

Bottleneck Design

Since the network is very deep now, the time complexity is high. A bottleneck design is used to reduce the complexity as follows: image The Basic Block (Left) and The Proposed Bottleneck Design (Right).

It turns out that 1×1 conv can reduce the number of connections (parameters) while not degrading the performance of the network so much. (Please visit my review if interested.)

With the bottleneck design, 34-layer ResNet become 50-layer ResNet. And there are deeper network with the bottleneck design: ResNet-101 and ResNet-152. The overall architecture for all network is as below: image The overall architecture for all network.

It is noted that VGG-16/19 has 15.3/19.6 billion FLOPS. ResNet-152 still has lower complexity than VGG-16/19!!!!

NorbertZheng commented 1 year ago

Ablation Study

Plain Network VS ResNet

image Validation Error: 18-Layer and 34-Layer Plain Network (Left), 18-Layer and 34-Layer ResNet (right).

image Top-1 Error Using 10-Crop Testing.

When plain network is used, 18-layer is better than 34-layer, due to the vanishing gradient problem.

When ResNet is used, 34-layer is better than 18-layer, vanishing gradient problem has been solved by skip connections.

If we compare 18-layer plain network and 18-layer ResNet, there is no much difference. This is because vanishing gradient problem does not appear for shallow network.

NorbertZheng commented 1 year ago

Other Settings

These are some techniques used in previous deep learning framework. If interested, please also feel free to read my reviews.

NorbertZheng commented 1 year ago

Comparison with State-of-the-art Approaches (Image Classification)

ILSVRC

image 10-Crop Testing Results.

By comparing ResNet-34 A ,B, and C, B is slightly better than A and C is marginally better than B because extra parameters are introduced with all obtain around 7% error rate.

By increasing the network depth to 152 layers, 5.71% top-5 error rate is obtained which is much better than VGG-16 #90, GoogLeNet (Inception-v1) #95, and PReLU-Net #92.

image 10-Crop Testing + Fully Conv with Multiple Scale Results.

With 10-Crop Testing + Fully Conv with Multiple, ResNet-152 can obtain 4.49% error rate.

image 10-Crop Testing + Fully Conv with Multiple Scale + 6-Model Ensemble Results.

Added with 6-model ensemble technique, the error rate is 3.57%.

NorbertZheng commented 1 year ago

CIFAR-10

image CIFAR-10 Results.

With skip connection, we can go deeper. However, when the number of layers is going from 110 to 1202, it is found that the error rate is increased from 6.43% to 7.93% and as an open question in the paper.

NorbertZheng commented 1 year ago

Comparison with State-of-the-art Approaches (Object Detection)

image PASCAL VOC 2007/2012 mAP (%).

image MS COCO mAP (%).

And ResNet finally won the 1st places on ImageNet Detection, Localization, COCO Detection and COCO Segmentation!!!

NorbertZheng commented 1 year ago

References