ethanhe42 / channel-pruning

Channel Pruning for Accelerating Very Deep Neural Networks (ICCV'17)
https://arxiv.org/abs/1707.06168
MIT License
1.07k stars 310 forks source link

stepend() error: Unable to write the modified weights (net.WPQ) into the .caffemodel file #54

Closed slothkong closed 6 years ago

slothkong commented 6 years ago

I'm truely sorry to ask but I 'm not sure why I'm unable to overwrite weights of vgg.caffemodel with the final WPQ. I simplified R3() to use your channel pruning algorithm only:

 def stepR1(net,conv,convnext,d_c):   
    if conv in net.selection:
        net.WPQ[(conv,0)] =  net.param_data(conv)[:,net.selection[conv],:,:]
        net.WPQ[(conv,1)] =  net.param_b_data(conv)
    else:
        net.WPQ[(conv,0)] =  net.param_data(conv)
        net.WPQ[(conv,1)] =  net.param_b_data(conv)
    if conv in pooldic:
        X_name = net.bottom_names[convnext][0]
    else: 
        X_name = conv

    idxs, W2, B2 = net.dictionary_kernel(X_name, None, d_c, convnext, None, DEBUG=True)
    # W2
    net.selection[convnext] = idxs
    net.param_data(convnext)[:, ~idxs, ...] = 0
    net.param_data(convnext)[:, idxs, ...] = W2.copy()
    net.set_param_b(convnext,B2)
    # W1
    net.WPQ[(conv,0)] = net.WPQ[(conv,0)][idxs]
    net.WPQ[(conv,1)] = net.WPQ[(conv,1)][idxs]
    net.set_conv(conv, num_output=sum(idxs))
    return net  

The code executes properly (all weights in WPQ have the same shape of your VGG-16_5x prototxt release). But when running stepend(), the instance of net that takes the new pt 3C4x_mem_bn_vgg.prototxt and the original model vgg.caffemodel fails ( the command is net = Net(new_pt, model=model)):

net.cpp:757] Cannot copy param 0 weights from layer 'conv1_2'; shape mismatch. Source param shape is 64 64 3 3 (36864); target param shape is 24 64 3 3 (13824). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. Check failure stack trace:

Have you ever experienced this type of issue? I'm sure I'm missing a line in my stepR1() function, but don;t know what. Maybe I need to use net.insert?? Please, any help would be appreciated. And once again thank you!!!

ethanhe42 commented 6 years ago

Source param shape is 64 64 3 3 (36864) indicates the caffemodel channels is not reduced

On Mon, Nov 6, 2017 at 8:56 PM, Mario notifications@github.com wrote:

I'm truely sorry to ask but I 'm not sure why I'm unable to overwrite weights of vgg.caffemodel with the final WPQ. I simplified R3() to use your channel pruning algorithm only:

def stepR1(net,conv,convnext,d_c): if conv in net.selection: net.WPQ[(conv,0)] = net.param_data(conv)[:,net.selection[conv],:,:] net.WPQ[(conv,1)] = net.param_b_data(conv) else: net.WPQ[(conv,0)] = net.param_data(conv) net.WPQ[(conv,1)] = net.param_b_data(conv) if conv in pooldic: X_name = net.bottom_names[convnext][0] else: X_name = conv

idxs, W2, B2 = net.dictionary_kernel(X_name, None, d_c, convnext, None, DEBUG=True)

W2

net.selection[convnext] = idxs net.param_data(convnext)[:, ~idxs, ...] = 0 net.param_data(convnext)[:, idxs, ...] = W2.copy() net.set_param_b(convnext,B2)

W1

net.WPQ[(conv,0)] = net.WPQ[(conv,0)][idxs] net.WPQ[(conv,1)] = net.WPQ[(conv,1)][idxs] net.set_conv(conv, num_output=sum(idxs)) return net

The code executes properly (all weights in WPQ have the same shape of your VGG-16_5x prototxt release). But when running stepend(), the instance of net that takes the new pt 3C4x_mem_bn_vgg.prototxt and the original model vgg.caffemodel fails ( the command is net = Net(new_pt, model=model)):

net.cpp:757] Cannot copy param 0 weights from layer 'conv1_2'; shape mismatch. Source param shape is 64 64 3 3 (36864); target param shape is 24 64 3 3 (13824). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer. Check failure stack trace:

Have you ever experienced this type of issue? I'm sure I'm missing a line in my stepR1() function, but don;t know what. Maybe I need to use net.insert?? Please, any help would be appreciated. And once again thank you!!!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/yihui-he/channel-pruning/issues/54, or mute the thread https://github.com/notifications/unsubscribe-auth/AJkBSzNqQYMyrW3a45tiYSHyfYeQT5JBks5sz7jIgaJpZM4QUJrm .

-- Best, Yihui He yihui-he.github.io

slothkong commented 6 years ago

I'm confused. I followed your code and reshaped WPQ while replacing caffemodel weigths with zeros instead of pruned filter channels of W2. I also use set_conv() to change the number of outputs of W1. set_conv() changes the caffemodel channels right?

slothkong commented 6 years ago

No idea what to do next , but I will keep working on it . 祝您身体健康,万事如意 ^^

ghzhangnj commented 6 years ago

@slothkong don't waste too much time.......
Since it has been open source, why answer the questions of others so unclear,especially for questions of how to resnet

slothkong commented 6 years ago

@ghzhangnj At this point, my graduation depends on me learning how to use this code ㅜ.ㅜ

slothkong commented 6 years ago

still nothing. help?

FatherOfHam commented 6 years ago

I haven't understand the whole code, but I think you can do stepend without passing the parameter model. I am not sure whether this will work, but you can have a try.

slothkong commented 6 years ago

@FatherOfHam unfortunately that would not work. stepend() makes an instance of the NET class, it requires a .caffemodel. If None is passed, gives error. @yihui-he points out the channels of the .caffemodel have not been modified, but my understanding was that we cant modify them during pruning and that was why we use WPQ to hold the pruned weights. I run this function on the net instance right before calling stepend() and none of the channels where changed:

def printweights(net):
   print("weights\t\tbias")
   for conv in net.convs:
      print(net.param_shape(conv), net.param_b_shape(conv))

Output: weights bias (64, 3, 3, 3) (64,) (64, 64, 3, 3) (64,) (128, 64, 3, 3) (128,) (128, 128, 3, 3) (128,) (256, 128, 3, 3) (256,) (256, 256, 3, 3) (256,) (256, 256, 3, 3) (256,) (512, 256, 3, 3) (512,) (512, 512, 3, 3) (512,) (512, 512, 3, 3) (512,) (512, 512, 3, 3) (512,) (512, 512, 3, 3) (512,) (512, 512, 3, 3) (512,)

The weights store in WPQ are the only one reshaped. Perhaps the problem is in the prototxt. I will check again the prototxt (new_pt) that I'm passing to stepend()

FatherOfHam commented 6 years ago

i don't think it will give you an error. From the code in the Net class constructor, you can see you are allowed to set model as None. if model is not None: self.net.copy_from(model) self.caffemodel_dir = model else: self.caffemodel_dir = 'temp/model.caffemodel' The weights is stored in memory before written into the caffe model, hope to discuss more details with you when I do some more experiments.

slothkong commented 6 years ago

@FatherOfHam I tried it already. Since 'temp/model.caffemodel' does not exists, the execution ends. But your comment made me think that it is possible to bypass stepend(). The original code generates a prototxt which has the key "num_outputs" of each Convolutional layer modified to the new channel number. We also have all the pruned weights stored in WPQ. So I must be able to create my own .caffemodel (no vgg.caffemodel needed). I will look into the net surgury example and tell you I success.

slothkong commented 6 years ago

@FatherOfHam I found out why @yihui-he's code works for 3C but stops working if you remove VH and ITQ. Is because the original code renames the layers! When doing pruning only, the generated prototxt is identical to the original vgg.prototxt except for the num_output property. So CAFFE tries to copy weights it from the vgg.caffemodel into a resized specification in the new prototxt, hence the error. OMG the struggle

bbjy commented 6 years ago

@slothkong Hi, have you solve the problem("When doing pruning only, the generated prototxt is identical to the original vgg.prototxt except for the num_output property. So CAFFE tries to copy weights it from the vgg.caffemodel into a resized specification in the new prototxt, hence the error.")? How do you make it,please! Thank you!