How is convolution computed over channels in Convolutional Neural Nets (CNNs)?

GilLevi commented 9 years ago

Hi,

I have a very simple theoretic question about the way convolution works in Caffe.

Let's take the fourth convolutional layer in the imagenet network - The input to this layer is a matrix of size 256x13x13 - meaning an image with 256 channels and width and height equal to 13 pixels. The convolutional layer itself consists of 384 filters of size 256x3x3.

Does it mean that for each filter, the layer performs convolution on each of the 256 channels separately and then merges the results in some fashion (average for example) or does it mean instead that it performs 3d convolution of a filter of size 256x3x3 with a matrix of size 256x13x13?

Thanks in advance,

Gil

shelhamer commented 9 years ago

In the context of CNNs the convolution is 3D with each filter output channel computed from all of the input channels. So the coefficients of a filter do vary with spatial location and input channel. This is in contrast to the conventional sense of convolution in image processing which is usually single-channel.

Please ask questions on the caffe-users mailing list. As of the latest release we prefer to keep issues reserved for Caffe development. Thanks!

GilLevi commented 9 years ago

Thanks for your help, Evan.

Gil.

BVLC / caffe

How is convolution computed over channels in Convolutional Neural Nets (CNNs)? #953