Sik-Ho Tang | Review: SqueezeNet (Image Classification).

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review: SqueezeNet (Image Classification).

NorbertZheng commented 1 year ago

Overview

In this story, SqueezeNet, by DeepScale, UC Berkeley and Stanford University, is reviewed. With equivalent accuracy, smaller CNN architectures offer at least three advantages:

Smaller Convolutional Neural Networks (CNNs) require less communication across servers during distributed training.
Smaller CNNs require less bandwidth to export a new model from the cloud to an autonomous car.
Smaller CNNs are more feasible to deploy on FPGAs and other hardware with limited memory.

This is a technical report on arXiv in 2016 with over 1100 citations.

NorbertZheng commented 1 year ago

Architectural Design Strategies

Strategy 1. Replace 3×3 filters with 1×1 filters

Given a budget of a certain number of convolution filters, we can choose to make the majority of these filters 1×1, since a 1×1 filter has 9× fewer parameters than a 3×3 filter.

Strategy 2. Decrease the number of input channels to 3×3 filters

Consider a convolution layer that is comprised entirely of 3×3 filters. The total quantity of parameters in this layer is:

(number of input channels) × (number of filters) × (3×3)

We can decrease the number of input channels to 3×3 filters using squeeze layers, mentioned in the next section.

Strategy 3. Downsample late in the network so that convolution layers have large activation maps

The intuition is that large activation maps (due to delayed downsampling) can lead to higher classification accuracy.

Summary

Strategies 1 and 2 are about judiciously decreasing the quantity of parameters in a CNN while attempting to preserve accuracy.
Strategy 3 is about maximizing accuracy on a limited budget of parameters.

NorbertZheng commented 1 year ago

Fire Module

Fire Module with hyperparameters: s1x1 = 3, e1x1 = 4, and e3x3 = 4.

A Fire module is comprised of: a squeeze convolution layer (which has only 1×1 filters), feeding into an expand layer that has a mix of 1×1 and 3×3 convolution filters.
There are three tunable dimensions (hyperparameters) in a Fire module: s1×1, e1×1, and e3×3.
- s1×1: The number of 1×1 in squeeze layer.
- e1×1 and e3×3: The number of 1×1 and 3×3 in expand layer.
When we use Fire modules we set s1×1 to be less than (e1×1 + e3×3), so the squeeze layer helps to limit the number of input channels to the 3×3 filters, as per Strategy 2 in previous section.
To me, it is quite a like of Inception Module.

NorbertZheng commented 1 year ago

SqueezeNet Architecture

SqueezeNet (Left), SqueezeNet with simple bypass (Middle), SqueezeNet with complex bypass (Right).

Details of SqueezeNet Architecture.

SqueezeNet (Left): begins with a standalone convolution layer (conv1), followed by 8 Fire modules (fire2–9), ending with a final conv layer (conv10).
The number of filters per fire module is gradually increased from the beginning to the end of the network.
Max-pooling with a stride of 2 is performed after layers conv1, fire4, fire8, and conv10.
SqueezeNet with simple bypass (Middle) and SqueezeNet with complex bypass (Right): The use of bypass is inspired by ResNet.

NorbertZheng commented 1 year ago

Evaluation of SqueezeNet

Comparing SqueezeNet to model compression approaches

Comparing SqueezeNet to model compression approaches.

With SqueezeNet, we achieve a 50× reduction in model size compared to AlexNet, while meeting or exceeding the top-1 and top-5 accuracy of AlexNet.
And the model size reduction is much higher than SVD, network pruning and deep compression.
Applying Deep Compression with 8-bit quantization, SqueezeNet yields a 0.66 MB model (363× smaller than 32-bit AlexNet) with equivalent accuracy to AlexNet. Further, applying Deep Compression with 6-bit quantization and 33% sparsity on SqueezeNet, a 0.47MB model (510× smaller than 32-bit AlexNet) with equivalent accuracy. SqueezeNet is indeed amenable to compression.

NorbertZheng commented 1 year ago

Hyperparameters

Different Hyperparameter Values for SqueezeNet.

Squeeze ratio (SR) (Left): the ratio between the number of filters in squeeze layers and the number of filters in expand layers.
Increasing SR beyond 0.125 can further increase ImageNet top-5 accuracy from 80.3% (i.e. AlexNet-level) with a 4.8MB model to 86.0% with a 19MB model. Accuracy plateaus at 86.0% with SR=0.75 (a 19MB model), and setting SR=1.0 further increases model size without improving accuracy.
Percentage of 3×3 Filters (Right): Top-5 accuracy plateaus at 85.6% using 50% 3×3 filters, and further increasing the percentage of 3×3 filters leads to a larger model size but provides no improvement in accuracy on ImageNet.

NorbertZheng commented 1 year ago

SqueezeNet Variants

SqueezeNet accuracy and model size using different macroarchitecture configurations.

Complex and simple bypass connections both yielded an accuracy improvement over the vanilla SqueezeNet architecture.
Interestingly, the simple bypass enabled a higher accuracy accuracy improvement than complex bypass.
Adding the simple bypass connections yielded an increase of 2.9 percentage-points in top-1 accuracy and 2.2 percentage-points in top-5 accuracy without increasing model size.

NorbertZheng commented 1 year ago

Reference

[2016 arXiv] [SqueezeNet] SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size.

NorbertZheng / read-papers