NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tang | Review: WRNs -- Wide Residual Networks (Image Classification). #105

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review: WRNs — Wide Residual Networks (Image Classification).

NorbertZheng commented 1 year ago

Overview

This time, WRNs (Wide Residual Networks) is presented. By widening Residual Network (ResNet), the network can be shallower with the same accuracy or improved accuracy. Shallower network means:

A better dropout is also investigated. This is a 2016 BMVC paper with more than 700 citations. Though this is a paper in the year of 2016, they still kept updating the paper in June 2017. image Various ResNet Blocks.

NorbertZheng commented 1 year ago

Problems on Residual Network (ResNet)

Circuit Complexity Theory

The circuit complexity theory literature showing that:

The authors of residual networks tried to make them as thin as possible in favor of increasing their depth and having less parameters, and even introduced a «bottleneck» block which makes ResNet blocks even thinner.

Diminishing Feature Reuse

However, As gradient flows through the network there is nothing to force it to go through residual block weights and it can avoid learning anything during training, so it is possible that there is either only a few blocks that learn useful representations, or many blocks share very little information with small contribution to the final goal.

NorbertZheng commented 1 year ago

WRNs (Wide Residual Networks)

In WRNs, plenty of parameters are tested such as the design of the ResNet block, how deep (deepening factor $l$) and how wide (widening factor $k$) within the ResNet block.

When $k=1$, it has the same width of ResNet. While $k>1$, it is $k$ time wider than ResNet.

WRN-d-k: means the WRN has the depth of $d$ and with widening factor $k$.

NorbertZheng commented 1 year ago

The design of ResNet block

image WRN-d-2 (k=2), Error Rate (%) in CIFAR-10 Dataset.

B(3;3) has the smallest error rate (5.73%).

Note: Number of depths (layers) are different is to keep the number of parameters close to each other.

NorbertZheng commented 1 year ago

Number of Convolutional Layers Within ResNet block

image WRN-40–2 with different l, Error Rate (%) in CIFAR-10 Dataset.

And two 3×3 convolutions, i.e. B(3,3) has the smallest error rate than the others . Because as all networks need to be kept as near the same parameters,

Thus, B(3,3) is optimal and gonna be used in the coming experiments.

NorbertZheng commented 1 year ago

Width of ResNet Blocks

image Different Width (k) and Depth on CIFAR-10 and CIFAR-100.

NorbertZheng commented 1 year ago

Results

CIFAR-10 & CIFAR-100

image CIFAR-10 & CIFAR-100.

NorbertZheng commented 1 year ago

Dropout

image Dropout in Original ResNet (Left) and Dropout in WRNs (Right).

image Dropout Is Better.

NorbertZheng commented 1 year ago

ImageNet & COCO

image Single Crop Single Model Validation Error, ImageNet.

image

NorbertZheng commented 1 year ago

Training Time

image Training Time for Each Batch with Batch Size of 32, CIFAR-10.

NorbertZheng commented 1 year ago

Since training takes much time, it can take couples of days or even weeks. When the training set is gonna larger and larger, a better way to train is needed. Indeed, in recent research, many researchers are still focusing on how to reduce the training time or number of epoches for training.

In WRNs, it reduce the training time but with the expense of increasing the number of parameters due to the widening of the network.

NorbertZheng commented 1 year ago

References