NorbertZheng / read-papers

My paper reading notes.
MIT License
7 stars 0 forks source link

Sik-Ho Tang | Review: ZFNet -- Winner of ILSVRC 2013 (Image Classification). #87

Closed NorbertZheng closed 1 year ago

NorbertZheng commented 1 year ago

Sik-Ho Tang. Review: ZFNet — Winner of ILSVRC 2013 (Image Classification).

NorbertZheng commented 1 year ago

Overview

In this story, ZFNet [1] is reviewed. ZFNet is a kind of winner of the ILSVRC (ImageNet Large Scale Visual Recognition Competition) 2013, which is an image classification competition, which has significantly improvement over AlexNet [2], the winner of ILSVRC 2012.

image ILSVRC Ranking.

Some people/articles think that ZFNet is not the winner, this conclusion maybe come from the ranking of ILSVRC, which as shown above. However, Clarifai is the company founded by the author of ZFNet, Zeiler. In addition, according to ImageNet Large Scale Visual Recognition Challenge, it mentioned:

The reference (Zeiler and Fergus, 2013) cited as in the above passage is ZFNet. Thus, it is officially announced that ZFNet is the winner!

This is a 2014 ECCV paper with more than 4000 citations when I was writing this story.

ImageNet, is a dataset of over 15 millions labeled high-resolution images with around 22,000 categories. ILSVRC uses a subset of ImageNet of around 1000 images in each of 1000 categories. In all, there are roughly 1.3 million training images, 50,000 validation images and 100,000 testing images.

image 15 millions of images.

NorbertZheng commented 1 year ago

Some Facts about Ranking

image ILSVRC2013 Ranking [3].

In 2013, ZFNet was invented by Dr. Rob Fergus and his PhD student at that moment, Dr. Matthew D. Zeiler in NYU. (Prof. Yann LeCun, the inventor of LeNet is also from NYU. Hence, they also thanks Prof. LeCun for discussions at the acknowledgement in the paper.) That’s why it is called ZFNet, based on their surname, Zeiler and Fergus, with the paper in 2014 ECCV, called “Visualizing and Understanding Convolutional Networks” [1]. Strictly speaking, ZFNet actually is not the winner of ILSVLC 2013. Instead, Clarifai, which was a new start-up company at that moment, is the winner of ILSVLC 2013 for image classification. And, Zeiler is also the founder and CEO of Clarifai.

As in the figure above,

And Clarifai has only small improvement over ZFNet. (For more details about the ranking, please go to [3].) Nevertheless, when we are talking about the deep learning network of the winner of ILSVLC 2013, we usually talk about ZFNet [1].

NorbertZheng commented 1 year ago

What We’ll Cover

How and why convolutional networks can perform so well is always a mystery. Most of the time, we can only reason by intuitive explanation or empirical experiment. In this story, I will cover how ZFNet visualizes the convolutional network. By visualizing the convolutional network, ZFNet become the Winner of ILSVLC 2013 in image classification by fine-tuning the AlexNet invented in 2012. Hence, the sections to be covered:

NorbertZheng commented 1 year ago

Deconvnet Techniques for Visualization

image As we should know, a standard step in deep learning framework is to have a series of

To visualize a deep layer feature, we need a set of decovnet techniques to reverse the above actions such that we can visualize the feature in pixel domain.

Unpooling

image Unpooling.

Max pooling operation is non-invertible, however we can obtain an approximate inverse by recording the locations of the maxima within each pooling region, as in the figure above.

Rectification (Activation Function)

Since ReLU is used as the activation function, and ReLU is to keep all values positive while make negative values become zero. In the reverse operation, we just need to perform ReLU again.

Deconv

image Conv (Blue is input, cyan is output).

image Deconv (Blue is input, cyan is output).

To do the deconv operation, indeed, it is a transposed version of conv.

NorbertZheng commented 1 year ago

Visualization for Each Layer

image Layer 1 and Layer 2.

By using deconv techniques, the top 9 activated patterns in randomly selected feature maps are shown for each layer. And two problems are observed in layer 1 and layer 2.

image Layer 3.

Let us observe 3 more layers.

image Layer 4 and Layer 5.

NorbertZheng commented 1 year ago

Modifications of AlexNet Based on Visualization Results

image ZFNet.

ZFNet is redrawn as the same style of AlexNet for the ease of comparison. To solve the two problems observed in layer 1 and layer 2, ZFNet makes two changes. (To read the AlexNet review, please visit [4].)

image Layer 1: (a) More mid-frequencies in ZFNet, (b) Extremely low and high frequencies in AlexNet.

image Layer 2: (c) Aliasing artifacts in AlexNet and (d) much cleaner features in ZFNet.

NorbertZheng commented 1 year ago

Experimental Results

The Modified ZFNet based on Ablation Study

image Ablation Study.

image The Modified ZFNet based on Ablation Study.

There are also ablation study on removing or adjusting layers. The modified ZFNet can obtain 16.0% on top-5 validation error.

NorbertZheng commented 1 year ago

Comparison with State-or-the-art Approaches

image Error Rate (%).

NorbertZheng commented 1 year ago

Other relatively small datasets are also tested

image Caltech 101 (83.8 to 86.5 mean accuracy).

image Caltech 256 (65.7 to 74.2 mean accuracy).

image PASCAL 2012 (79.0 mean accuracy).

From the above tables, we can see that, the accuracy, without pre-training of ZFNet using ImageNet images, i.e. train the ZFNet from the scratch, is low. With the training (fine-tuning) on top of the pre-trained ZFNet, the accuracy is much high. That means

Particularly for Caltech 101 and Caltech 256 datasets, ZFNet has overwhelming results.

For PASCAL 2012, the PASCAL images can contain multiple objects and quite different from nature compared with those in ImageNet. Thus, the accuracy is a bit lower but still competitive with state-of-the-art approaches.

NorbertZheng commented 1 year ago

Conclusions

While only shallow layer features can be observed previously, this paper provides an interesting approach to observe deep features in pixel domain.

By visualizing the convolutional network layer by layer, ZFNet adjusts the layer hyperparameters such as filter size or stride of the AlexNet and successfully reduces the error rates.

NorbertZheng commented 1 year ago

References