Closed NorbertZheng closed 1 year ago
In this story, SqueezeNet, by DeepScale, UC Berkeley and Stanford University, is reviewed. With equivalent accuracy, smaller CNN architectures offer at least three advantages:
This is a technical report on arXiv in 2016 with over 1100 citations.
Given a budget of a certain number of convolution filters, we can choose to make the majority of these filters 1×1, since a 1×1 filter has 9× fewer parameters than a 3×3 filter.
Consider a convolution layer that is comprised entirely of 3×3 filters. The total quantity of parameters in this layer is:
We can decrease the number of input channels to 3×3 filters using squeeze layers, mentioned in the next section.
The intuition is that large activation maps (due to delayed downsampling) can lead to higher classification accuracy.
Fire Module with hyperparameters: s1x1 = 3, e1x1 = 4, and e3x3 = 4.
SqueezeNet (Left), SqueezeNet with simple bypass (Middle), SqueezeNet with complex bypass (Right).
Details of SqueezeNet Architecture.
Comparing SqueezeNet to model compression approaches.
Different Hyperparameter Values for SqueezeNet.
SqueezeNet accuracy and model size using different macroarchitecture configurations.
Sik-Ho Tang. Review: SqueezeNet (Image Classification).