How to update layer parameters from python?

timothy-shields commented 9 years ago

I have a sequence of Caffe layers with no loss layer in a Caffe network. In my python code, I want to repeatedly take the following steps:

Do a forward pass through the network.
Compute my own losses and gradients based on the output data blob of the final layer of the network.
Set the diff blob of the final layer of the network.
Do a backward pass through the network, updating layer parameters.
(Occassionally) Save the network to persist the learned parameters.

The problem I've encountered is that the backward pass (step 4) is not actually updating layer parameters. How can I modify my approach to get Caffe to update the layer parameters when I perform the backward pass?

(I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial, such that fitting them into that framework seems difficult.)

timothy-shields commented 9 years ago

After further inspection, it appears that the Update method on the Net is what needs to be called in my Step 4. The problem is that this is not available on the Python Net object and is only callable via the SGDSolver. Would it be possible to expose the Update method on the Net directly, so that a custom solver in Python is made possible?

shelhamer commented 9 years ago

Do a backward pass through the network, ~~updating layer parameters.~~

The backward pass computes the gradients, not the updates, which are left as the responsibility of the solver. However the parameters are exposed in Python caffe.Nets as the net.params dictionary. These are mutable; you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like.

I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial

I have a feeling you'll like https://github.com/BVLC/caffe/pull/1703: it improves the Python interface and lets nets incorporate layers developed in Python. See the Python layer example in https://github.com/BVLC/caffe/pull/1020#issue-41610379 (replaced by #1703) too. If you hack your loss as a Python layer, you can do the solving as usual.

Defining your loss as a Python layer is the simplest in my opinion. See for example the Euclidean loss as a Python layer.

We're working on merging and documenting the new pycaffe shortly!

timothy-shields commented 9 years ago

Thank you for the quick support! The following code appears to be working for me, where the assumption is that I've already properly scaled my diff at the network output layer and done net.backward(...).

for layer in net.layers:
    for blob in layer.blobs:
        blob.data[...] -= blob.diff

mczmy commented 9 years ago

@timothy-shields I had problems with step3, could you please give me a brief introduction on how you achieved this? Thanks.

timothy-shields commented 9 years ago

@mczmy If your output layer name is, for example, fc9, then you can do a backward pass (as in step 3) as follows, where diff is a 4D numpy array of the appropriate shape.

net.backward(fc9=diff)

This will perform backpropagation to update diffs throughout the network.

arunmallya commented 9 years ago

@timothy-shields That doesn't quite allow you to use momentum or the weight decay, does it? Those are calculated in the solver step() function. I assume you have to implement the ComputeUpdateValue() function in python. Is that the case?

timothy-shields commented 9 years ago

@arunmallya Yes, you need to implement your own solver if you do this.

peerajak commented 9 years ago

Hi @timothy-shields timothy-shields

I would like to do the samething. Can you please show us how to do it?

mahdaneh commented 9 years ago

I want to compute the gradient of the new loss function with respect to the last fully connected layer, If I tries to change .diff of the last layer with my computed gradient, the backward applies my computed gradient through the network? In this condition HOW many iterations should the optimization (forward and backward) run? We should control the number of iteration simply by a for in python? Any help appreciate.

Thank you guys!

timothy-shields commented 9 years ago

@peerajak @mahdaneh If you take this route, you need to implement the stochastic gradient descent solver yourself. Section 5 of ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al.) tells you how to do this. You will learn a lot by doing it yourself.

mahdaneh commented 9 years ago

Thank you for your answer. But I dont want to reimplement the SGD by myself. @shelhamer said in the above comment: "you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like". What I would like to do is that change diff of the parameters of the last inner product layer with the computed gradient of the loss function with respect to the parameters of the last inner product, and then backpro algorithm done with calling net.backward() based on this. I could not do that?

timothy-shields commented 9 years ago

@mahdaneh That is of course what you would do. net.backward(...) will set the diff blobs throughout the network. But then you need to actually apply those diff blobs to update the parameters. I don't believe you'll be able to use Caffe as the SGD solver, so you'll have to implement it yourself or find a general purpose library that supplies the SGD algorithm.

mahdaneh commented 9 years ago

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.

mczmy commented 9 years ago

Hey mahdaneh,

Do you have a email that I can email you?

Sent from my IPhone

On Oct 1, 2015, at 11:23 AM, mahdaneh notifications@github.com wrote:

Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?

In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.

— Reply to this email directly or view it on GitHub.

shelhamer commented 9 years ago

@mahdaneh @timothy-shields Define your loss as Python layer to make use of the rest of the Caffe machinery, like solvers, without giving up the convenience of Python. See for example the Euclidean loss as a Python layer.

mahdaneh commented 9 years ago

mabbulv@yahoo.com

mahdaneh commented 9 years ago

Thank you for your help @timothy-shields and @shelhamer . Since I just utilizing Caffe's layers not solver, I should implement my solver and not use SGDSolver.

peerajak commented 9 years ago

@timothy-shields Thanks for your reply. Where do you store your weights? in the python environment?

Hi @mczmy, @mahdaneh . Could you please send me the solver code? peerajak@gmail.com

Hi @shelhamer I am trying to use python layer but I have big problem. Python Layer does not support weights. I mean Python layers does not have net.params['layer_name']. Therefore, I have to store the weights on python environment, or save it to the disk. If I want to use caffe mechanism, I need to be able to save weights to the disk. But then again, SGD would not update these weights so I have to update them myself during backward pass. Can I do it this way? Is there another way around this?

kchhhk commented 9 years ago

Hi guys. @mczmy, @mahdaneh , I've been facing a lot of problem in implementing my own solver, where I cant figure out how to update my parameters after a net.backward() pass. Could you please send me your solver code too? Thanks kabir.chhabra12@gmail.com

zeakey commented 8 years ago

Hi guys, I feed image data to my network and get the output by net.forward(input_data)

Then I compute the gradient manually then use net.backward(gradient) to back pass gradient.

But I find that this will never update the net parameters.

And I cannot find a update method in caffe.Net or caffe.Solver, I just want a standard SGD to update.

I'm using the latest caffe version at the time.

Can someone help me ?

I have tried to add 'force_backward' to my network, and it doesn't work.

@shelhamer @arunmallya @mczmy

Soumali13 commented 7 years ago

can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey , force_backward doesnot work

peerajak commented 7 years ago

I did it successfully. I write my own classifier layer with python on caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey https://github.com/zeakey , force_backward doesnot work

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261593251, or mute the thread https://github.com/notifications/unsubscribe-auth/AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO .

Soumali13 commented 7 years ago

Thanks a lot,

I am loading a net from the prototxt file in a c++ program. Then I am calling forward() and backward() on the inputs. On every input and every iteration the forward() works correctly but the backward() doesnot work. Can you tell me why? The gradients in the cpu->diff() are always 0.

On Sat, Nov 19, 2016 at 5:29 AM, peerajak notifications@github.com wrote:

I did it successfully. I write my own classifier layer with python on caffe. I will write a how to when have time.

On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:

can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey https://github.com/zeakey , force_backward doesnot work

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261593251, or mute the thread https://github.com/notifications/unsubscribe-auth/ AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261692759, or mute the thread https://github.com/notifications/unsubscribe-auth/AWQcOPeM5NZtVY54V6bUylW2JgTq3neWks5q_ntCgaJpZM4DeZbO .

Soumali Roychowdhury PhD Student IMT Institute for Advanced Studies Piazza San Francesco, 19 55100 Lucca – Italy Tel: +39 3892450516 E-mail: soumali.roychowdhury@imtlucca.it soumali.roychowdhury@imtlucca.it

peerajak commented 7 years ago

Python Layer can setup its weight parameter's blob, and can back propagate the gradient to this blob, as well as input layer. This caffe updates happened around a year ago. There is no need to write your own C++ solver. Here is how `

    def setup(self, bottom, top):
           % This is to initialize your weight layer
           self.blobs.add_blob(some_int_len_value)  
            % This is how to initialize with array
           self.blobs[0].data[...] = some_np_array_of_correct_size 

     def reshape(self, bottom, top):
          self.diff = np.zeros_like(bottom[0].data)  % set up diff. 
          top[0].reshape(1)    % in my case its loss layer. 

     def forward(self, bottom, top): 
          w = self.blobs[0].data   %This is how to get the value from param

     def backward(self, top, propagate_down, bottom):  
          % This is how to bp w.r.t input      
          bottom[0].diff[...] = some_nparray_of_size_bottom  
          % This is how to bp w.r.t this weight 
          self.blobs[0].diff[...] = some_nparray_of_size_w

`

zuowang commented 7 years ago

@peerajak I can't follow you on your above code. Could you give more explanation? Currently I do manual update like following, could you tell me how to make it work in your way? Thanks a lot!

# Manual SGD
solver = None
solver = caffe.SGDSolver('examples/mnist/lenet_solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2
gamma = 0.1
stepsize = 5000

momentum_hist = {}
for layer in solver.net.params:
    m_w = np.zeros_like(solver.net.params[layer][0].data)
    m_b = np.zeros_like(solver.net.params[layer][1].data)
    momentum_hist[layer] = [m_w, m_b]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    # *manually update*
    for layer in solver.net.params:
        momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
                                                       solver.net.params[layer][0].data) * base_lr * lr_w_mult
        momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
                                                       solver.net.params[layer][1].data) * base_lr * lr_b_mult
        solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
        solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
        solver.net.params[layer][0].diff[...] *= 0
        solver.net.params[layer][1].diff[...] *= 0
    base_lr = base_lr * np.power(gamma, (np.floor(it / stepsize)))

peerajak commented 7 years ago

zuowang,

Please see. 1) http://chrischoy.github.io/research/caffe-python-layer/ 2) https://github.com/BVLC/caffe/pull/2944 and you will recognize my code above.

VasLem commented 7 years ago

@peerajak will (1) work with minibatch learning? Shouldn't all computations of diff be inside backward() method so that to actually work for every kind of training? If I try to get bottom[0].data let's say from inside the backward(), what data will I get while mini batch training? The mean of all the data inside batch or the last blob of the batch? If the last one stands, then the training will not be correct. Am I missing something?

peerajak commented 7 years ago

@VasLem

Yes. It works for minibatch.
Normally, yes. You calculate the diff during backward(). I found out, however, that I can save some matrices calculated during the forward() to the class variable, and later calculate the diff based on those saved matrices. In my case, my layer is the loss layer. I am not sure if this trick can be apply to non loss layer.
I never call bottom[0].data during backward(), so I am not sure.

BVLC / caffe

How to update layer parameters from python? #1855