NVIDIA / DIGITS

Deep Learning GPU Training System
https://developer.nvidia.com/digits
BSD 3-Clause "New" or "Revised" License
4.12k stars 1.38k forks source link

shape mismatch using pretrained alexnet model #913

Closed paras42 closed 8 years ago

paras42 commented 8 years ago

I was trying to use a pretrained alexnet model for the first time, but encountered this error message:

"ERROR: Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 96 3 11 11 (34848); target param shape is 96 1 11 11 (11616). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer."

I was using the BVLC_alexnet.caffemodel

This was my train_val.prototxt - I renamed "fc8" to "fc9". That was the only change. Any help would be appreciated. Thanks.


name: "AlexNet" layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { mirror: true crop_size: 227 } data_param { batch_size: 100 } } layer { name: "data" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { crop_size: 227 } data_param { batch_size: 100 } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 stride: 4 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } layer { name: "norm1" type: "LRN" bottom: "conv1" top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1" type: "Pooling" bottom: "norm1" top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 2 kernel_size: 5 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu2" type: "ReLU" bottom: "conv2" top: "conv2" } layer { name: "norm2" type: "LRN" bottom: "conv2" top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2" type: "Pooling" bottom: "norm2" top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "conv3" type: "Convolution" bottom: "pool2" top: "conv3" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu3" type: "ReLU" bottom: "conv3" top: "conv3" } layer { name: "conv4" type: "Convolution" bottom: "conv3" top: "conv4" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 384 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu4" type: "ReLU" bottom: "conv4" top: "conv4" } layer { name: "conv5" type: "Convolution" bottom: "conv4" top: "conv5" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 256 pad: 1 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu5" type: "ReLU" bottom: "conv5" top: "conv5" } layer { name: "pool5" type: "Pooling" bottom: "conv5" top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } } layer { name: "fc6" type: "InnerProduct" bottom: "pool5" top: "fc6" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu6" type: "ReLU" bottom: "fc6" top: "fc6" } layer { name: "drop6" type: "Dropout" bottom: "fc6" top: "fc6" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc7" type: "InnerProduct" bottom: "fc6" top: "fc7" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 4096 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } } } layer { name: "relu7" type: "ReLU" bottom: "fc7" top: "fc7" } layer { name: "drop7" type: "Dropout" bottom: "fc7" top: "fc7" dropout_param { dropout_ratio: 0.5 } } layer { name: "fc9" type: "InnerProduct" bottom: "fc7" top: "fc9" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } inner_product_param { weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "accuracy" type: "Accuracy" bottom: "fc9" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "fc9" bottom: "label" top: "loss" }

TimZaman commented 8 years ago

It seems like you are using single (1) channel (grayscale) images. Try using a color (3 channel) dataset. Alternatively you could use a grayscale (1 channel) pretrained model.

IsaacYangSLA commented 8 years ago

@TimZaman , thanks. @paras42 , Tim pointed out the issue. The pre-trained weights tells your Caffe the conv1 should be constructed to handle 3 channels (the 3 in 96, 3, 11, 11), but your data is single channel (1 in 96, 1, 11, 11). The number 96 is num_output, 11 the kernel_size, and they match between train_val.prototxt and pre-trained weights. The only mismatch is that number of channels.

paras42 commented 8 years ago

Great, Thanks for the explanation. Very helpful.

Paras

Sent from my iPhone

Sent from my iPhone

On Jul 16, 2016, at 1:02 PM, Isaac Yang notifications@github.com wrote:

@TimZaman , thanks. @paras42 , Tim pointed out the issue. The pre-trained weights tells your Caffe the conv1 should be constructed to handle 3 channels (the 3 in 96, 3, 11, 11), but your data is single channel (1 in 96, 1, 11, 11). The number 96 is num_output, 11 the kernel_size, and they match between train_val.prototxt and pre-trained weights. The only mismatch is that number of channels.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

IsaacYangSLA commented 8 years ago

@para42, you are welcome. I will close this issue, but you are free to add more comments in case your issue happen again.

mvab commented 8 years ago

@paras42 I'm having the same problem. I was wondering what did you end up doing - did you find a good grayscale model?

TimZaman commented 8 years ago

@marinavab I'm afraid the easiest solution is pretraining your own Grayscale-Alexnet with just 1 input layer instead of 3. You'd have to download something like ImageNet and let your grayscale alexnet train for a weekend, depending on your GPU(s). In DIGITS, you can just import the ImageNet dataset by setting the 'Grayscale' flag, so it will import all images as grayscale. If you are in an immediate rush, you could also probably expand the dimension of your input of 1 channel into 3 channels using Caffe - but that will probably give you pretty poor performance as the pretrained network is used to seeing things with full color.

masaff commented 7 years ago

is there any caffe pre-trained model so that we can use on gray-scale images? I just changed the number of channel in deploy.prototxt file from 3 to 1 and also add force_gray: true in my data_param but I still have the same error. Can you please help me?

medhani commented 7 years ago

I was trying to test a convolution and deconvolution network. But I got this error.

Cannot copy param 0 weights from layer 'deconv3_4'; shape mismatch. Source param shape is 256 256 3 3 (589824); target param shape is 512 256 3 3 (1179648). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer

I didn’t change anything on the network structure . I only replace the data layer as follows.

layer { name: "data" type: "Input" top: "data" input_param { shape: { dim: 20 dim: 3 dim: 256 dim: 256 } } }

TimZaman commented 7 years ago

Seems the shape discrepancy is 512 x 256 versus 256 x 256. Are you sure your input data format is one or the other? Are you pretraining?

medhani commented 7 years ago

I trained the network and now I want to load the weights. I used 256 x 256 size images for training.

ghost commented 5 years ago

@medhani Could you tell me exactly where you had to add the data layer in the prototxt?