chengtaipu / lowrankcnn

Low-rank convolutional neural networks
96 stars 44 forks source link

something wrong to convert vgg16 pretrained model #1

Open kaishijeng opened 8 years ago

kaishijeng commented 8 years ago

I am able to convert caffenet, but got an error when I try it with vgg16.

F0312 08:11:17.590416 30365 insert_splits.cpp:35] Unknown blob input data to layer 0 * Check failure stack trace: * Aborted (core dumped)

The command I use is:

python2 lowrank_approx.py --model models_vgg/vgg_deploy.prototxt --config models_vgg/config.json --save_model models_vgg/vgg_lowrank_deploy.prototxt --weights VGG_ILSVRC_16_layers.caffemodel --save_weights vgg_lowrank.caffemodel

Do you know why?

Cysu commented 8 years ago

Thanks very much for pointing out this issue. There is a typo inside the vgg_deploy.prototxt. We have fixed it in the last commit. Please check it out.

kaishijeng commented 8 years ago

WIth the latest code, I don't see errors anymore. Just curious, why didn't model size reduce a lot after lowrank transform? Below are my results and model size seems almost the same before and after:

VGG_ILSVRC_16_layers.caffemodel: 553432081 vgg_lowrank.caffemodel: 516036227
bvlc_reference_caffenet.caffemodel: 243862418 caffenet_lowrank.caffemodel: 236399387

Cysu commented 8 years ago

Because the model size is dominated by the fully connected layers. We only do the lowrank approximation on the convolution layers, which aims at speeding up the computation rather than reducing the size of the whole model.

kaishijeng commented 8 years ago

Thanks for explanation.

What speed up do I expect for VGG16 and Caffenet?

Thanks

On Sun, Mar 13, 2016 at 9:25 AM, Tong Xiao notifications@github.com wrote:

Because the model size is dominated by the fully connected layers. We only do the lowrank approximation on the convolution layers, which aims at speeding up the computation rather than reducing the size of the whole model.

— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-195989729 .

Cysu commented 8 years ago

It depends on the hardware and maybe the cuda / cudnn version.

I have timed the models on my machine (Titan Black, cuda 7.5, cudnn v4) with batch size set to 256 for caffenet and 32 for VGG16. The Average Forward-Backward time per minibatch is listed below

CaffeNet: 668.406 ms CaffeNet-lowrank: 307.033 ms VGG16: 1570.38 ms VGG16-lowrank: 759.401 ms

kaishijeng commented 8 years ago

I have a TitanX GPU with Cuda 7.5 and don't use cudnn. Is there a parameter in the code which can be adjusted to balance accuracy and speed?

On Sun, Mar 13, 2016 at 7:32 PM, Tong Xiao notifications@github.com wrote:

It depends on the hardware and maybe the cuda / cudnn version.

I have timed the models on my machine (Titan Black, cuda 7.5, cudnn v4) with batch size set to 256 for caffenet and 32 for VGG16. The Average Forward-Backward time per minibatch is listed below

CaffeNet: 668.406 ms CaffeNet-lowrank: 307.033 ms VGG16: 1570.38 ms VGG16-lowrank: 759.401 ms

— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-196111836 .

Cysu commented 8 years ago

Yes. The models_*/config.json contains the K-value for each layer (please refer to our paper for more details). You may tweak these values to control the trade-off.

kaishijeng commented 8 years ago

I am able to use your SW to speed up vgg16 in the prediction with default K values from your github below. Just curious what good K values I should use if I want to speed up more at expense of accuracy.

{ "conv1_1": 5, "conv1_2": 24, "conv2_1": 48, "conv2_2": 48, "conv3_1": 64, "conv3_2": 128, "conv3_3": 160, "conv4_1": 192, "conv4_2": 192, "conv4_3": 256, "conv5_1": 320, "conv5_2": 320, "conv5_3": 320 }

On Sun, Mar 13, 2016 at 11:28 PM, Tong Xiao notifications@github.com wrote:

Yes. The models_*/config.json contains the K-value for each layer (please refer to our paper http://arxiv.org/pdf/1511.06067v3.pdf for more details). You may tweak these values to control the trade-off.

— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-196165510 .

Cysu commented 8 years ago

An upper bound of the GPU speedup in practice would be 4.6x by setting all the K to 1. I will try to finetune this extreme model to see the accuracy lower bound. I think only through experiments can we find a good config for trade-off.

kaishijeng commented 8 years ago

Thanks for the info. I will give it a try. If you have performance of k=1, please share it with me

On Mon, Mar 21, 2016 at 10:30 PM, Tong Xiao notifications@github.com wrote:

An upper bound of the GPU speedup in practice would be 4.6x by setting all the K to 1. I will try to finetune this extreme model to see the accuracy lower bound. I think only through experiments can we find a good config for trade-off.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-199644539

Xuezhi-Liang commented 8 years ago

First thank you for your code . When I run the test, i got some problems.

python lowrank_approx.py --model models_vgg/vgg_deploy.prototxt --config models_vgg/config.json --save_model models_vgg/vgg_lowrank_deploy.prototxt --weights

VGG_ILSVRC_16_layers.caffemodel --save_weights vgg_lowrank.caffemodel Traceback (most recent call last): File "lowrank_approx.py", line 133, in main(args) File "lowrank_approx.py", line 110, in main make_lowrank_model(args.model, conf, args.save_model) File "lowrank_approx.py", line 63, in make_lowrank_model v, h = vh_decompose(layer, conf[layer.name]) File "lowrank_approx.py", line 40, in vh_decompose v_param.kernel_h, v_param.kernel_w = conv_param.kernel_size, 1 File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/python_message.py", line 669, in field_setter new_value = type_checker.CheckValue(new_value) File "/usr/local/lib/python2.7/dist-packages/google/protobuf/internal/type_checkers.py", line 132, in CheckValue raise TypeError(message) TypeError: [3] has type <class 'google.protobuf.internal.containers.RepeatedScalarFieldContainer'>, but expected one of: (<type 'int'>, <type 'long'>)

Do you know why?

Cysu commented 8 years ago

It seems that there is a mismatch between our caffe's optional uint32 kernel_size and the official's repeated uint32 kernel_size in the params of convolutional layers proto. A temporary solution might be cloning this repository with our caffe submodule

git clone --recursive https://github.com/chengtaipu/lowrankcnn.git

Compile the imagenet/caffe and then run the python script again.

Xuezhi-Liang commented 8 years ago

Thank you. It worked .

Xuezhi-Liang commented 8 years ago

Hello, I encounter another problem when using your code in another model.

I0913 22:40:25.933809 6594 net.cpp:417] Input 0 -> data I0913 22:40:25.933858 6594 layer_factory.hpp:74] Creating layer conv1_v I0913 22:40:25.933887 6594 net.cpp:96] Creating Layer conv1_v I0913 22:40:25.933905 6594 net.cpp:459] conv1_v <- data I0913 22:40:25.933929 6594 net.cpp:415] conv1_v -> conv1_v I0913 22:40:25.933956 6594 net.cpp:160] Setting up conv1_v I0913 22:40:25.934005 6594 net.cpp:167] Top shape: 1 5 45 47 (10575) I0913 22:40:25.934032 6594 layer_factory.hpp:74] Creating layer conv1_h I0913 22:40:25.934057 6594 net.cpp:96] Creating Layer conv1_h I0913 22:40:25.934074 6594 net.cpp:459] conv1_h <- conv1_v I0913 22:40:25.934097 6594 net.cpp:415] conv1_h -> conv1 I0913 22:40:25.934119 6594 net.cpp:160] Setting up conv1_h I0913 22:40:25.934185 6594 net.cpp:167] Top shape: 1 32 45 45 (64800) I0913 22:40:25.934206 6594 net.cpp:508] Sharing parameters 'conv1_w' owned by layer 'conv1_v', param index 0 F0913 22:40:25.934222 6594 net.cpp:522] Check failed: this_blob->shape() == owner_blob->shape() * Check failure stack trace: * Aborted (core dumped)

Do you know why? Thank you.

Cysu commented 8 years ago

It seems that there are shared parameters in your prototxt. Could you please specify the prototxt and caffe you used? Neither our caffe nor the official one has a check statement at net.cpp:522.

Xuezhi-Liang commented 8 years ago

prototxt:

name: "DeepID_face" input: "data_1" input_dim: 1 input_dim: 3 input_dim: 64 input_dim: 64 layer { name: "conv1_1" type: "Convolution" bottom: "data_1" top: "conv1_1" param { name: "conv1_w" lr_mult: 1 decay_mult: 1 } param { name: "conv1_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 20 kernel_size: 4 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "norm1_1" type: "LRN" bottom: "conv1_1" top: "norm1_1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1_1" type: "Pooling" bottom: "norm1_1" top: "pool1_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1_1" top: "conv2_1" param { name: "conv2_w" lr_mult: 1 decay_mult: 1 } param { name: "conv2_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 40 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }

} layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "norm2_1" type: "LRN" bottom: "conv2_1" top: "norm2_1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2_1" type: "Pooling" bottom: "norm2_1" top: "pool2_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2_1" top: "conv3_1" param { name: "conv3_w" lr_mult: 1 decay_mult: 1 } param { name: "conv3_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 60 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }

} layer { name: "pool3_1" type: "Pooling" bottom: "conv3_1" top: "pool3_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3_1" top: "conv4_1" param { name: "conv4_w" lr_mult: 1 decay_mult: 1 } param { name: "conv4_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 80 kernel_size: 2 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }

} layer{ name:"flatten_pool3_1" type:"Flatten" bottom:"pool3_1" top:"flatten_pool3_1" } layer{ name:"flatten_conv4_1" type:"Flatten" bottom:"conv4_1" top:"flatten_conv4_1" } layer{ name:"contact_conv" type:"Concat" bottom:"flatten_conv4_1" bottom:"flatten_pool3_1" top:"contact_conv" } layer { name: "deepid_1" type: "InnerProduct" bottom: "contact_conv" top: "deepid_1" param { name: "fc6_w" lr_mult: 1 decay_mult: 1 } param { name: "fc6_b" lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 160 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }

}

I use the imagenet/caffe .

Cysu commented 8 years ago

OK. I see. Because we will decompose a conv layer into two consecutive ones, sharing parameters is not supported currently.

A temporary solution might be keeping only one branch of the siamese net, removing the name: "fc6_w" like strings in the prototxt, and then run the script. After that, use the generated prototxt to build a new siamese net with shared parameters.

Xuezhi-Liang commented 8 years ago

Thank you for your patience ,I understand.

Xuezhi-Liang commented 8 years ago

Another problem has emerged when i do it follow your advice.

Traceback (most recent call last): File "lowrank_approx.py", line 134, in main(args) File "lowrank_approx.py", line 118, in main args.save_weights) File "lowrank_approx.py", line 97, in approx_lowrank_weights v = v[:, :K].reshape((C, D, 1, K)).transpose(3, 0, 1, 2) ValueError: total size of new array must be unchanged

Do you know why?

Cysu commented 8 years ago

What's the output of print C, D, K, v.shape before this line?

wenwei202 commented 8 years ago

Great job! I get ~2x speedup for AlexNet by "Titan black + cudnn4 + cuda7.5", but when I profile by "1080+cudnn5+cuda8.0", the forwarding speed is almost the same.

Original model:

I1023 00:00:26.726905 16461 caffe.cpp:404]      conv1   forward: 12.8815 ms.
I1023 00:00:26.726968 16461 caffe.cpp:404]      conv2   forward: 16.8281 ms.
I1023 00:00:26.726994 16461 caffe.cpp:404]      conv3   forward: 8.8992 ms.
I1023 00:00:26.727006 16461 caffe.cpp:404]      conv4   forward: 6.71963 ms.
I1023 00:00:26.727020 16461 caffe.cpp:404]      conv5   forward: 4.33419 ms.
I1023 00:01:04.304991 16481 caffe.cpp:412] Average Forward pass: 79.5562 ms.
I1023 00:01:04.304998 16481 caffe.cpp:414] Average Backward pass: 153.39 ms.
I1023 00:01:04.305006 16481 caffe.cpp:416] Average Forward-Backward: 233.04 ms.

Low-rank model:

I1022 23:54:49.625118 16360 caffe.cpp:404]    conv1_v   forward: 3.11468 ms.
I1022 23:54:49.625124 16360 caffe.cpp:404]    conv1_h   forward: 5.94275 ms.
I1022 23:54:49.625149 16360 caffe.cpp:404]    conv2_v   forward: 1.45381 ms.
I1022 23:54:49.625155 16360 caffe.cpp:404]    conv2_h   forward: 3.73543 ms.
I1022 23:54:49.625180 16360 caffe.cpp:404]    conv3_v   forward: 3.53724 ms.
I1022 23:54:49.625185 16360 caffe.cpp:404]    conv3_h   forward: 6.59154 ms.
I1022 23:54:49.625197 16360 caffe.cpp:404]    conv4_v   forward: 5.24775 ms.
I1022 23:54:49.625203 16360 caffe.cpp:404]    conv4_h   forward: 6.29479 ms.
I1022 23:54:49.625216 16360 caffe.cpp:404]    conv5_v   forward: 5.9966 ms.
I1022 23:54:49.625221 16360 caffe.cpp:404]    conv5_h   forward: 4.72984 ms.
I1023 00:02:01.703972 16504 caffe.cpp:412] Average Forward pass: 76.0703 ms.
I1023 00:02:01.703979 16504 caffe.cpp:414] Average Backward pass: 124.839 ms.
I1023 00:02:01.703987 16504 caffe.cpp:416] Average Forward-Backward: 201.013 ms.

Seems the speedup is very hardware-specific?

Cysu commented 8 years ago

I think it is because cudnn5 uses Winograd to accelerate the 3x3 convolutions specifically, which makes it even faster than using naive algorithms to compute 1x3 and 3x1 convs.

Xuezhi-Liang commented 8 years ago

Hello, When the filters of first CONV below 10,it worked. Otherwise it crashed. And the message like this: I1102 22:04:28.570008 3576 net.cpp:294] Network initialization done. I1102 22:04:28.570020 3576 net.cpp:295] Memory required for data: 3843580 Traceback (most recent call last): File "lowrank_approx.py", line 133, in main(args) File "lowrank_approx.py", line 117, in main args.save_weights) File "lowrank_approx.py", line 96, in approx_lowrank_weights v = v[:, :K].reshape((C, D, 1, K)).transpose(3, 0, 1, 2) ValueError: total size of new array must be unchanged

Do you know why? Thank you!

KeyKy commented 6 years ago

According to @wenwei202, So low ranking method is only suitable to speedup conv1 and conv2? In my opinion, low ranking method factorize a Convolution layer into two small Convolutions. It needs to call im2col 2 times.

Johnson-yue commented 6 years ago

@Cysu ,Hi, thank you for your sharing , I have a trouble now. If the group of conv layer is bigger than 1 . How should I do ?? It does not work