PiLab-CAU / ComputerVision-2401

Computer Vision Course 2024-01
Apache License 2.0
9 stars 3 forks source link

[Lecture9][0614] Demension of fully connected layer #46

Closed 423data closed 3 months ago

423data commented 3 months ago

KakaoTalk_20240614_225751556

First, I checked issue #39. I'm curious about the dimension of W. Like (3x32x32)x(#hidden=10)=30K, I'm curious about what 3 is, what is the first 32 and what is the second 32? I've never studied neural networks, so please understand that I lack relevant knowledge.

yewonkim01 commented 3 months ago

In this slide, you can think of the input image as a color image with RGB channels and a 32(height) x 32(width) image size.

  1. 3: the number of channels in the input image I think most images we see in this class are in RGB format, so 3 is the number of channels, red, green, and blue.

  2. 32 : height of the input image

  3. 32 : width of the input image

  4. 10 : the number of neurons in the hidden layer

In a "Fully Connected" Layer, as you can see in this layer name, each input node is connected to "every node" in the next layer. Since we are dealing with a 2D image, but we need to feed this 2D image into the Fully Connected Layer, we have to "flatten" this image into a 1D vector. This is why we multiply(3 x 32 x 32) the number of channels, and height and width of this image.

The dimension of weight matrix W is (3 x 32 x 32) x 10. Here, (3 x 32 x 32) input features are connected to each of the 10 neurons in the hidden layer.

I think this picture may help your understanding. Just think of it as a single-channel, 3 x 3 image size and 4 hidden neurons. image

If there's any part you don't understand, feel free to ask:)

423data commented 3 months ago

Thank you so much for your response! I understood the concept well because you even attached an image that I can understand easily!!