datalass1 / fastai

this repo will show code and notes covered during the fastai course
0 stars 0 forks source link

Look at more resources to understand CNNs #14

Closed datalass1 closed 5 years ago

datalass1 commented 5 years ago

The key piece of a convolutional neural network is convolution. A good example is http://setosa.io/ev/image-kernels

Most kernels in deep learning are 3x3 really fantastic interactive book http://neuralnetworksanddeeplearning.com/chap4.html

A great paper about what the convolutional layers are learning: Visualizing and Understanding Convolutional Networks, Matthew D. Zeiler, Rob Fergus: https://arxiv.org/pdf/1311.2901.pdf

datalass1 commented 5 years ago

Image Kernels, Explained Visually

Victor Powell

An image kernel is a small matrix used to apply effects like the ones you might find in Photoshop or Gimp, such as blurring, sharpening, outlining or embossing. They're also used in machine learning for 'feature extraction', a technique for determining the most important portions of an image. In this context the process is referred to more generally as "convolution". See how they work here.

datalass1 commented 5 years ago

Visualizing and Understanding Convolutional Networks

Matthew D. Zeiler, Rob Fergus

Introduction Since their introduction by (LeCun et al., 1989) in the early 1990’s, Convolutional Networks (convnets) have demonstrated excellent performance at tasks such as hand-written digit classification and face detection.

Renewed interest in convnets:

  1. the availability of much larger training sets, with millions of labeled examples
  2. powerful GPU implementations, making the training of very large models practical
  3. better model regularization strategies, such as Dropout

The visualization technique that reveals the input stimuli that excite individual feature maps at any layer in the model. It also allows us to observe the evolution of features during training and to diagnose potential problems with the model. The visualization technique we propose uses a multi-layered Deconvolutional Network (deconvnet), as proposed by (Zeiler et al., 2011), to project the feature activations back to the input pixel space.

We also perform a sensitivity analysis of the classifier output by occluding portions of the input image, revealing which parts of the scene are important for classification.

Invariance: to abstract input variables is a highly desirable property of features for many detection and classification tasks, such as object recognition. The concept of invariance implies a selectivity for complex, high level features of the input and yet a robustness to irrelevant input transformations. (Measuring Invariances in Deep Networks)

The problem is that for higher layers, the invariances are extremely complex so are poorly captured by a simple quadratic approximation.

Our approach provides a non-parametric view of invariance, showing which patterns from the training set activate the feature map show visualizations that identify patches within a dataset that are responsible for strong activations at higher layers in the model.

Visualization with a Deconvnet We present a novel way to map these activities back to the input pixel space, showing what input pattern originally caused a given activation in the feature maps. We perform this mapping with a Deconvolutional Network (deconvnet) (Zeiler et al., 2011). A deconvnet can be thought of as a convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.

Unpooling: record locations of maxima within each pooling region in a set of switch variables. Rectification: pass the reconstructed signal through a relu (rectified linear unit) non-linearity. ReLU is the (or one of) the most used activation functions in the world. Filtering: The convnet uses learned filters to convolve the feature maps from the previous layer. To invert this, the deconvnet uses transposed versions of the same filters, but applied to the rectified maps, not the output of the layer beneath.

Training the model We stopped training after 70 epochs, which took around 12 days on a single GTX580 GPU, using an implementation based on (Krizhevsky et al., 2012).

Discussion Features show many intuitively desirable properties such as compositionality, increasing invariance and class discrimination as we ascend the layers.

Our convnet model generalized less well to the PASCAL data, perhaps suffering from dataset bias (Torralba & Efros, 2011), although it was still within 3.2% of the best reported result, despite no tuning for the task.

image image image

The general consensus is that in an optimally trained convolution network, the filters at the very edge (close to the image) becomes sensitive to basic edges and patterns. The filters in the deeper layers become sensitized to gradually higher orders shapes and patterns.