Closed timothy-shields closed 9 years ago
After further inspection, it appears that the Update
method on the Net
is what needs to be called in my Step 4. The problem is that this is not available on the Python Net
object and is only callable via the SGDSolver
. Would it be possible to expose the Update
method on the Net
directly, so that a custom solver in Python is made possible?
Do a backward pass through the network,
updating layer parameters.
The backward pass computes the gradients, not the updates, which are left as the responsibility of the solver. However the parameters are exposed in Python caffe.Net
s as the net.params
dictionary. These are mutable; you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update
or the like.
I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial
I have a feeling you'll like https://github.com/BVLC/caffe/pull/1703: it improves the Python interface and lets nets incorporate layers developed in Python. See the Python layer example in https://github.com/BVLC/caffe/pull/1020#issue-41610379 (replaced by #1703) too. If you hack your loss as a Python layer, you can do the solving as usual.
Defining your loss as a Python layer is the simplest in my opinion. See for example the Euclidean loss as a Python layer.
We're working on merging and documenting the new pycaffe shortly!
Thank you for the quick support! The following code appears to be working for me, where the assumption is that I've already properly scaled my diff at the network output layer and done net.backward(...)
.
for layer in net.layers:
for blob in layer.blobs:
blob.data[...] -= blob.diff
@timothy-shields I had problems with step3, could you please give me a brief introduction on how you achieved this? Thanks.
@mczmy If your output layer name is, for example, fc9
, then you can do a backward pass (as in step 3) as follows, where diff
is a 4D numpy array of the appropriate shape.
net.backward(fc9=diff)
This will perform backpropagation to update diffs throughout the network.
@timothy-shields That doesn't quite allow you to use momentum or the weight decay, does it? Those are calculated in the solver step() function. I assume you have to implement the ComputeUpdateValue() function in python. Is that the case?
@arunmallya Yes, you need to implement your own solver if you do this.
Hi @timothy-shields timothy-shields
I would like to do the samething. Can you please show us how to do it?
I want to compute the gradient of the new loss function with respect to the last fully connected layer, If I tries to change .diff of the last layer with my computed gradient, the backward applies my computed gradient through the network? In this condition HOW many iterations should the optimization (forward and backward) run? We should control the number of iteration simply by a for in python? Any help appreciate.
Thank you guys!
@peerajak @mahdaneh If you take this route, you need to implement the stochastic gradient descent solver yourself. Section 5 of ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky et al.) tells you how to do this. You will learn a lot by doing it yourself.
Thank you for your answer. But I dont want to reimplement the SGD by myself. @shelhamer said in the above comment: "you can do net surgery or write a whole solver in Python by updating these param arrays. Assign to them by net.params['layer_name'].data[...] += weight_update or the like". What I would like to do is that change diff of the parameters of the last inner product layer with the computed gradient of the loss function with respect to the parameters of the last inner product, and then backpro algorithm done with calling net.backward() based on this. I could not do that?
@mahdaneh That is of course what you would do. net.backward(...)
will set the diff
blobs throughout the network. But then you need to actually apply those diff
blobs to update the parameters. I don't believe you'll be able to use Caffe as the SGD solver, so you'll have to implement it yourself or find a general purpose library that supplies the SGD algorithm.
Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?
In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.
Hey mahdaneh,
Do you have a email that I can email you?
Sent from my IPhone
On Oct 1, 2015, at 11:23 AM, mahdaneh notifications@github.com wrote:
Since my question is exactly the first question that you asked, I would like to know that you implemented your solver in Caffe? Did you get benefit of using the provided functions (forward, backward) in (SGD) solver to optimize your deep network?
In fact, as I follows all 4 steps that you indicated above, I see my parameters dont update through the network. You know I have the same problem as you had. I would appreciate if you can help me.
— Reply to this email directly or view it on GitHub.
@mahdaneh @timothy-shields Define your loss as Python layer to make use of the rest of the Caffe machinery, like solvers, without giving up the convenience of Python. See for example the Euclidean loss as a Python layer.
mabbulv@yahoo.com
Thank you for your help @timothy-shields and @shelhamer . Since I just utilizing Caffe's layers not solver, I should implement my solver and not use SGDSolver.
@timothy-shields Thanks for your reply. Where do you store your weights? in the python environment?
Hi @mczmy, @mahdaneh . Could you please send me the solver code? peerajak@gmail.com
Hi @shelhamer I am trying to use python layer but I have big problem. Python Layer does not support weights. I mean Python layers does not have net.params['layer_name']. Therefore, I have to store the weights on python environment, or save it to the disk. If I want to use caffe mechanism, I need to be able to save weights to the disk. But then again, SGD would not update these weights so I have to update them myself during backward pass. Can I do it this way? Is there another way around this?
Hi guys. @mczmy, @mahdaneh , I've been facing a lot of problem in implementing my own solver, where I cant figure out how to update my parameters after a net.backward() pass. Could you please send me your solver code too? Thanks kabir.chhabra12@gmail.com
Hi guys,
I feed image data to my network and get the output by net.forward(input_data)
Then I compute the gradient manually then use net.backward(gradient)
to back pass gradient.
But I find that this will never update the net parameters.
And I cannot find a update method in caffe.Net
or caffe.Solver
, I just want a standard SGD to update.
I'm using the latest caffe version at the time.
Can someone help me ?
I have tried to add 'force_backward' to my network, and it doesn't work.
@shelhamer @arunmallya @mczmy
can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey , force_backward doesnot work
I did it successfully. I write my own classifier layer with python on caffe. I will write a how to when have time.
On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:
can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey https://github.com/zeakey , force_backward doesnot work
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261593251, or mute the thread https://github.com/notifications/unsubscribe-auth/AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO .
Thanks a lot,
I am loading a net from the prototxt file in a c++ program. Then I am calling forward() and backward() on the inputs. On every input and every iteration the forward() works correctly but the backward() doesnot work. Can you tell me why? The gradients in the cpu->diff() are always 0.
On Sat, Nov 19, 2016 at 5:29 AM, peerajak notifications@github.com wrote:
I did it successfully. I write my own classifier layer with python on caffe. I will write a how to when have time.
On Nov 19, 2016 12:39 AM, "Soumali13" notifications@github.com wrote:
can anyone tell me how to calculate the gradient manually for the last layer of caffe? I have the same problem as @zeakey https://github.com/zeakey , force_backward doesnot work
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261593251, or mute the thread https://github.com/notifications/unsubscribe-auth/ AE8qCe1zeKPyIBBm9tv5EIwVyVQnI1qYks5q_eLkgaJpZM4DeZbO .
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/BVLC/caffe/issues/1855#issuecomment-261692759, or mute the thread https://github.com/notifications/unsubscribe-auth/AWQcOPeM5NZtVY54V6bUylW2JgTq3neWks5q_ntCgaJpZM4DeZbO .
Soumali Roychowdhury PhD Student IMT Institute for Advanced Studies Piazza San Francesco, 19 55100 Lucca – Italy Tel: +39 3892450516 E-mail: soumali.roychowdhury@imtlucca.it soumali.roychowdhury@imtlucca.it
Python Layer can setup its weight parameter's blob, and can back propagate the gradient to this blob, as well as input layer. This caffe updates happened around a year ago. There is no need to write your own C++ solver. Here is how `
def setup(self, bottom, top):
% This is to initialize your weight layer
self.blobs.add_blob(some_int_len_value)
% This is how to initialize with array
self.blobs[0].data[...] = some_np_array_of_correct_size
def reshape(self, bottom, top):
self.diff = np.zeros_like(bottom[0].data) % set up diff.
top[0].reshape(1) % in my case its loss layer.
def forward(self, bottom, top):
w = self.blobs[0].data %This is how to get the value from param
def backward(self, top, propagate_down, bottom):
% This is how to bp w.r.t input
bottom[0].diff[...] = some_nparray_of_size_bottom
% This is how to bp w.r.t this weight
self.blobs[0].diff[...] = some_nparray_of_size_w
`
@peerajak I can't follow you on your above code. Could you give more explanation? Currently I do manual update like following, could you tell me how to make it work in your way? Thanks a lot!
# Manual SGD
solver = None
solver = caffe.SGDSolver('examples/mnist/lenet_solver.prototxt')
base_lr = 0.01
momentum = 0.9
weight_decay = 0.0005
lr_w_mult = 1
lr_b_mult = 2
gamma = 0.1
stepsize = 5000
momentum_hist = {}
for layer in solver.net.params:
m_w = np.zeros_like(solver.net.params[layer][0].data)
m_b = np.zeros_like(solver.net.params[layer][1].data)
momentum_hist[layer] = [m_w, m_b]
for it in range(1, niter+1):
solver.net.forward() # fprop
solver.net.backward() # bprop
# *manually update*
for layer in solver.net.params:
momentum_hist[layer][0] = momentum_hist[layer][0] * momentum + (solver.net.params[layer][0].diff + weight_decay *
solver.net.params[layer][0].data) * base_lr * lr_w_mult
momentum_hist[layer][1] = momentum_hist[layer][1] * momentum + (solver.net.params[layer][1].diff + weight_decay *
solver.net.params[layer][1].data) * base_lr * lr_b_mult
solver.net.params[layer][0].data[...] -= momentum_hist[layer][0]
solver.net.params[layer][1].data[...] -= momentum_hist[layer][1]
solver.net.params[layer][0].diff[...] *= 0
solver.net.params[layer][1].diff[...] *= 0
base_lr = base_lr * np.power(gamma, (np.floor(it / stepsize)))
zuowang,
Please see. 1) http://chrischoy.github.io/research/caffe-python-layer/ 2) https://github.com/BVLC/caffe/pull/2944 and you will recognize my code above.
@peerajak will (1) work with minibatch learning? Shouldn't all computations of diff be inside backward() method so that to actually work for every kind of training? If I try to get bottom[0].data let's say from inside the backward(), what data will I get while mini batch training? The mean of all the data inside batch or the last blob of the batch? If the last one stands, then the training will not be correct. Am I missing something?
@VasLem
I have a sequence of Caffe layers with no loss layer in a Caffe network. In my python code, I want to repeatedly take the following steps:
The problem I've encountered is that the backward pass (step 4) is not actually updating layer parameters. How can I modify my approach to get Caffe to update the layer parameters when I perform the backward pass?
(I'm aware of the SGDSolver but am avoiding using that because the final layers of my network (step 2), which are written only in python, are very nontrivial, such that fitting them into that framework seems difficult.)