Open kaishijeng opened 8 years ago
Thanks very much for pointing out this issue. There is a typo inside the vgg_deploy.prototxt. We have fixed it in the last commit. Please check it out.
WIth the latest code, I don't see errors anymore. Just curious, why didn't model size reduce a lot after lowrank transform? Below are my results and model size seems almost the same before and after:
VGG_ILSVRC_16_layers.caffemodel: 553432081
vgg_lowrank.caffemodel: 516036227
bvlc_reference_caffenet.caffemodel: 243862418
caffenet_lowrank.caffemodel: 236399387
Because the model size is dominated by the fully connected layers. We only do the lowrank approximation on the convolution layers, which aims at speeding up the computation rather than reducing the size of the whole model.
Thanks for explanation.
What speed up do I expect for VGG16 and Caffenet?
Thanks
On Sun, Mar 13, 2016 at 9:25 AM, Tong Xiao notifications@github.com wrote:
Because the model size is dominated by the fully connected layers. We only do the lowrank approximation on the convolution layers, which aims at speeding up the computation rather than reducing the size of the whole model.
— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-195989729 .
It depends on the hardware and maybe the cuda / cudnn version.
I have timed the models on my machine (Titan Black, cuda 7.5, cudnn v4) with batch size set to 256 for caffenet and 32 for VGG16. The Average Forward-Backward time per minibatch is listed below
CaffeNet: 668.406 ms CaffeNet-lowrank: 307.033 ms VGG16: 1570.38 ms VGG16-lowrank: 759.401 ms
I have a TitanX GPU with Cuda 7.5 and don't use cudnn. Is there a parameter in the code which can be adjusted to balance accuracy and speed?
On Sun, Mar 13, 2016 at 7:32 PM, Tong Xiao notifications@github.com wrote:
It depends on the hardware and maybe the cuda / cudnn version.
I have timed the models on my machine (Titan Black, cuda 7.5, cudnn v4) with batch size set to 256 for caffenet and 32 for VGG16. The Average Forward-Backward time per minibatch is listed below
CaffeNet: 668.406 ms CaffeNet-lowrank: 307.033 ms VGG16: 1570.38 ms VGG16-lowrank: 759.401 ms
— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-196111836 .
Yes. The models_*/config.json contains the K-value for each layer (please refer to our paper for more details). You may tweak these values to control the trade-off.
I am able to use your SW to speed up vgg16 in the prediction with default K values from your github below. Just curious what good K values I should use if I want to speed up more at expense of accuracy.
{ "conv1_1": 5, "conv1_2": 24, "conv2_1": 48, "conv2_2": 48, "conv3_1": 64, "conv3_2": 128, "conv3_3": 160, "conv4_1": 192, "conv4_2": 192, "conv4_3": 256, "conv5_1": 320, "conv5_2": 320, "conv5_3": 320 }
On Sun, Mar 13, 2016 at 11:28 PM, Tong Xiao notifications@github.com wrote:
Yes. The models_*/config.json contains the K-value for each layer (please refer to our paper http://arxiv.org/pdf/1511.06067v3.pdf for more details). You may tweak these values to control the trade-off.
— Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-196165510 .
An upper bound of the GPU speedup in practice would be 4.6x by setting all the K to 1. I will try to finetune this extreme model to see the accuracy lower bound. I think only through experiments can we find a good config for trade-off.
Thanks for the info. I will give it a try. If you have performance of k=1, please share it with me
On Mon, Mar 21, 2016 at 10:30 PM, Tong Xiao notifications@github.com wrote:
An upper bound of the GPU speedup in practice would be 4.6x by setting all the K to 1. I will try to finetune this extreme model to see the accuracy lower bound. I think only through experiments can we find a good config for trade-off.
— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/chengtaipu/lowrankcnn/issues/1#issuecomment-199644539
First thank you for your code . When I run the test, i got some problems.
python lowrank_approx.py --model models_vgg/vgg_deploy.prototxt --config models_vgg/config.json --save_model models_vgg/vgg_lowrank_deploy.prototxt --weights
VGG_ILSVRC_16_layers.caffemodel --save_weights vgg_lowrank.caffemodel
Traceback (most recent call last):
File "lowrank_approx.py", line 133, in
Do you know why?
It seems that there is a mismatch between our caffe's optional uint32 kernel_size
and the official's repeated uint32 kernel_size
in the params of convolutional layers proto. A temporary solution might be cloning this repository with our caffe submodule
git clone --recursive https://github.com/chengtaipu/lowrankcnn.git
Compile the imagenet/caffe
and then run the python script again.
Thank you. It worked .
Hello, I encounter another problem when using your code in another model.
I0913 22:40:25.933809 6594 net.cpp:417] Input 0 -> data I0913 22:40:25.933858 6594 layer_factory.hpp:74] Creating layer conv1_v I0913 22:40:25.933887 6594 net.cpp:96] Creating Layer conv1_v I0913 22:40:25.933905 6594 net.cpp:459] conv1_v <- data I0913 22:40:25.933929 6594 net.cpp:415] conv1_v -> conv1_v I0913 22:40:25.933956 6594 net.cpp:160] Setting up conv1_v I0913 22:40:25.934005 6594 net.cpp:167] Top shape: 1 5 45 47 (10575) I0913 22:40:25.934032 6594 layer_factory.hpp:74] Creating layer conv1_h I0913 22:40:25.934057 6594 net.cpp:96] Creating Layer conv1_h I0913 22:40:25.934074 6594 net.cpp:459] conv1_h <- conv1_v I0913 22:40:25.934097 6594 net.cpp:415] conv1_h -> conv1 I0913 22:40:25.934119 6594 net.cpp:160] Setting up conv1_h I0913 22:40:25.934185 6594 net.cpp:167] Top shape: 1 32 45 45 (64800) I0913 22:40:25.934206 6594 net.cpp:508] Sharing parameters 'conv1_w' owned by layer 'conv1_v', param index 0 F0913 22:40:25.934222 6594 net.cpp:522] Check failed: this_blob->shape() == owner_blob->shape() * Check failure stack trace: * Aborted (core dumped)
Do you know why? Thank you.
It seems that there are shared parameters in your prototxt. Could you please specify the prototxt and caffe you used? Neither our caffe nor the official one has a check statement at net.cpp:522
.
prototxt:
name: "DeepID_face" input: "data_1" input_dim: 1 input_dim: 3 input_dim: 64 input_dim: 64 layer { name: "conv1_1" type: "Convolution" bottom: "data_1" top: "conv1_1" param { name: "conv1_w" lr_mult: 1 decay_mult: 1 } param { name: "conv1_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 20 kernel_size: 4 stride: 1 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } } } layer { name: "relu1_1" type: "ReLU" bottom: "conv1_1" top: "conv1_1" } layer { name: "norm1_1" type: "LRN" bottom: "conv1_1" top: "norm1_1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool1_1" type: "Pooling" bottom: "norm1_1" top: "pool1_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv2_1" type: "Convolution" bottom: "pool1_1" top: "conv2_1" param { name: "conv2_w" lr_mult: 1 decay_mult: 1 } param { name: "conv2_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 40 kernel_size: 3 group: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }
} layer { name: "relu2_1" type: "ReLU" bottom: "conv2_1" top: "conv2_1" } layer { name: "norm2_1" type: "LRN" bottom: "conv2_1" top: "norm2_1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } } layer { name: "pool2_1" type: "Pooling" bottom: "norm2_1" top: "pool2_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv3_1" type: "Convolution" bottom: "pool2_1" top: "conv3_1" param { name: "conv3_w" lr_mult: 1 decay_mult: 1 } param { name: "conv3_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 60 kernel_size: 3 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0 } }
} layer { name: "pool3_1" type: "Pooling" bottom: "conv3_1" top: "pool3_1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "conv4_1" type: "Convolution" bottom: "pool3_1" top: "conv4_1" param { name: "conv4_w" lr_mult: 1 decay_mult: 1 } param { name: "conv4_b" lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 80 kernel_size: 2 stride: 2 weight_filler { type: "gaussian" std: 0.01 } bias_filler { type: "constant" value: 0.1 } }
} layer{ name:"flatten_pool3_1" type:"Flatten" bottom:"pool3_1" top:"flatten_pool3_1" } layer{ name:"flatten_conv4_1" type:"Flatten" bottom:"conv4_1" top:"flatten_conv4_1" } layer{ name:"contact_conv" type:"Concat" bottom:"flatten_conv4_1" bottom:"flatten_pool3_1" top:"contact_conv" } layer { name: "deepid_1" type: "InnerProduct" bottom: "contact_conv" top: "deepid_1" param { name: "fc6_w" lr_mult: 1 decay_mult: 1 } param { name: "fc6_b" lr_mult: 2 decay_mult: 0 } inner_product_param { num_output: 160 weight_filler { type: "gaussian" std: 0.005 } bias_filler { type: "constant" value: 0.1 } }
}
I use the imagenet/caffe .
OK. I see. Because we will decompose a conv layer into two consecutive ones, sharing parameters is not supported currently.
A temporary solution might be keeping only one branch of the siamese net, removing the name: "fc6_w"
like strings in the prototxt, and then run the script. After that, use the generated prototxt to build a new siamese net with shared parameters.
Thank you for your patience ,I understand.
Another problem has emerged when i do it follow your advice.
Traceback (most recent call last):
File "lowrank_approx.py", line 134, in
Do you know why?
What's the output of print C, D, K, v.shape
before this line?
Great job! I get ~2x speedup for AlexNet by "Titan black + cudnn4 + cuda7.5", but when I profile by "1080+cudnn5+cuda8.0", the forwarding speed is almost the same.
Original model:
I1023 00:00:26.726905 16461 caffe.cpp:404] conv1 forward: 12.8815 ms.
I1023 00:00:26.726968 16461 caffe.cpp:404] conv2 forward: 16.8281 ms.
I1023 00:00:26.726994 16461 caffe.cpp:404] conv3 forward: 8.8992 ms.
I1023 00:00:26.727006 16461 caffe.cpp:404] conv4 forward: 6.71963 ms.
I1023 00:00:26.727020 16461 caffe.cpp:404] conv5 forward: 4.33419 ms.
I1023 00:01:04.304991 16481 caffe.cpp:412] Average Forward pass: 79.5562 ms.
I1023 00:01:04.304998 16481 caffe.cpp:414] Average Backward pass: 153.39 ms.
I1023 00:01:04.305006 16481 caffe.cpp:416] Average Forward-Backward: 233.04 ms.
Low-rank model:
I1022 23:54:49.625118 16360 caffe.cpp:404] conv1_v forward: 3.11468 ms.
I1022 23:54:49.625124 16360 caffe.cpp:404] conv1_h forward: 5.94275 ms.
I1022 23:54:49.625149 16360 caffe.cpp:404] conv2_v forward: 1.45381 ms.
I1022 23:54:49.625155 16360 caffe.cpp:404] conv2_h forward: 3.73543 ms.
I1022 23:54:49.625180 16360 caffe.cpp:404] conv3_v forward: 3.53724 ms.
I1022 23:54:49.625185 16360 caffe.cpp:404] conv3_h forward: 6.59154 ms.
I1022 23:54:49.625197 16360 caffe.cpp:404] conv4_v forward: 5.24775 ms.
I1022 23:54:49.625203 16360 caffe.cpp:404] conv4_h forward: 6.29479 ms.
I1022 23:54:49.625216 16360 caffe.cpp:404] conv5_v forward: 5.9966 ms.
I1022 23:54:49.625221 16360 caffe.cpp:404] conv5_h forward: 4.72984 ms.
I1023 00:02:01.703972 16504 caffe.cpp:412] Average Forward pass: 76.0703 ms.
I1023 00:02:01.703979 16504 caffe.cpp:414] Average Backward pass: 124.839 ms.
I1023 00:02:01.703987 16504 caffe.cpp:416] Average Forward-Backward: 201.013 ms.
Seems the speedup is very hardware-specific?
I think it is because cudnn5 uses Winograd to accelerate the 3x3 convolutions specifically, which makes it even faster than using naive algorithms to compute 1x3 and 3x1 convs.
Hello, When the filters of first CONV below 10,it worked. Otherwise it crashed. And the message like this:
I1102 22:04:28.570008 3576 net.cpp:294] Network initialization done.
I1102 22:04:28.570020 3576 net.cpp:295] Memory required for data: 3843580
Traceback (most recent call last):
File "lowrank_approx.py", line 133, in
Do you know why? Thank you!
According to @wenwei202, So low ranking method is only suitable to speedup conv1 and conv2? In my opinion, low ranking method factorize a Convolution layer into two small Convolutions. It needs to call im2col 2 times.
@Cysu ,Hi, thank you for your sharing , I have a trouble now. If the group of conv layer is bigger than 1 . How should I do ?? It does not work
I am able to convert caffenet, but got an error when I try it with vgg16.
F0312 08:11:17.590416 30365 insert_splits.cpp:35] Unknown blob input data to layer 0 * Check failure stack trace: * Aborted (core dumped)
The command I use is:
python2 lowrank_approx.py --model models_vgg/vgg_deploy.prototxt --config models_vgg/config.json --save_model models_vgg/vgg_lowrank_deploy.prototxt --weights VGG_ILSVRC_16_layers.caffemodel --save_weights vgg_lowrank.caffemodel
Do you know why?