Open jiawei357 opened 8 years ago
Since we can't write custom GPU layers in Caffe using Python, the only way to compute losses and gradients at certain layers is to grab the activations and compute them using numpy. If you'd like faster backprop, you can try the gram-layer
branch, which does the full forward & backward pass on the GPU, but requires an extra Caffe layer written in C++.
I have a question that may just be due to my lack of programming knowledge.
In line 191 and 198, the grad variable i updated with the computed gradient. However, the grad seems to have no influence on the net.backward() call, as it is not used to update the network. Finally, in line 205, grad is reset to the diff in the next layer, which discard the previous grad computation.
I am confused about this part of the code. Could you help me understand it? I am not very fluent with Python, and this might be the reason why I am lost in here.
Thanks,
What he does is use it as some kind of pointer and so he can update the gradient. And the updated gradient could be used for back propagation in the next layer.
2016年3月3日星期四,hermitman notifications@github.com 写道:
I have a question that may just be due to my lack of programming knowledge.
In line 191 and 198, the grad variable i updated with the computed gradient. However, the grad seems to have no influence on the net.backward() call, as it is not used to update the network. Finally, in line 205, grad is reset to the diff in the next layer, which discard the previous grad computation.
I am confused about this part of the code. Could you help me understand it? I am not very fluent with Python, and this might be the reason why I am lost in here.
Thanks,
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192069567 .
@jiawei357
Hi, thanks for the response. I thought about this explanation, which indicates that grad is a pointer points to net.blobs[layer].diff[0]. However, I found two things that I do not understand:
I used id([variable]) to verify the memory address of grad, after assigning grad = net.blobs[layer].diff[0],
v.s. net.blobs[layer].diff[0]
and the two ids are not the same. (Is this a problem related to CPU/GPU address?)
Thanks for the answer, I think the results are fine, but I am just curious about the code.
Well for the first thing I'm not sure why that happened. What I did in this project is making a new custom layer called loss layer(Euclidean for content loss and the gram matrix thing for style loss). And in the custom layer I added the additional gradient in the back propagation process. And caffe took care of all the rest BP(from conv layer to data layer). So yeah I think in either way(BP layer by layer or using custom Python layer) you can both do that.
2016年3月3日星期四,hermitman notifications@github.com 写道:
@jiawei357 https://github.com/jiawei357
Hi, thanks for the response. I thought about this explanation, which indicates that grad is a pointer points to net.blobs[layer].diff[0]. However, I found two things that I do not understand:
- I used id([variable]) to verify the memory address of grad, after assigning:
grad = net.blobs[layer].diff[0],
and the two ids are not the same. (Is this a problem related to CPU/GPU address?)
- Is it OK to insert additional gradient at a layer by adding the gradient computed from local loss to the backpropagated loss?
Thanks for the answer, I think the results are fine, but I am just curious about the code.
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192072276 .
Not sure if I stated that clear enough
2016年3月3日星期四,hermitman notifications@github.com 写道:
@jiawei357 https://github.com/jiawei357
Hi, thanks for the response. I thought about this explanation, which indicates that grad is a pointer points to net.blobs[layer].diff[0]. However, I found two things that I do not understand:
- I used id([variable]) to verify the memory address of grad, after assigning:
grad = net.blobs[layer].diff[0],
and the two ids are not the same. (Is this a problem related to CPU/GPU address?)
- Is it OK to insert additional gradient at a layer by adding the gradient computed from local loss to the backpropagated loss?
Thanks for the answer, I think the results are fine, but I am just curious about the code.
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192072276 .
@jiawei357
Thanks for the clarification. So in your implementation, you have multiple loss computation at different conv layers, which are used as content/style layers?
Could I take a look at your network prototxt? I think that should answer the question xD
Currently I'm on spring break so couldn't give you my prototxt. What I did is have multiple input layer. One for white noise, others for precomputed style or content gram matrix/activation and a custom layer that takes output of all those input layer and conv layers.
2016年3月3日星期四,hermitman notifications@github.com 写道:
@jiawei357 https://github.com/jiawei357
Thanks for the clarification. So in your implementation, you have multiple loss computation at different conv layers, which are used as content/style layers?
Could I take a look at your network prototxt? I think that should answer the question xD
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192075640 .
@jiawei357 and your loss layer will do backpropagation from the end of the network to the input?
In custom loss layer you only have to define the gradient for the bottom layer.
2016年3月3日星期四,hermitman notifications@github.com 写道:
@jiawei357 https://github.com/jiawei357 and your loss layer will do backpropagation from the end of the network to the input?
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192078664 .
what is the bottom layer? input?
I mean the input layers that connects to your loss layer. So, if you have one style layer and one content layer connected to the custom loss, then your loss will backprop to each of them, respectively.
Hey there - didn't get a chance to read through the whole thread, but this might be of interest to you: https://github.com/fzliu/style-transfer/tree/gram-layer.
The bottom layer also include the conv layer of your network. That's where we want to set the gradient. The gradient for those input layer could be set to be zero.
2016年3月3日星期四,hermitman notifications@github.com 写道:
I mean the input layers that connects to your loss layer. So, if you have one style layer and one content layer connected to the custom loss, then your loss will backprop to each of them, respectively.
— Reply to this email directly or view it on GitHub https://github.com/fzliu/style-transfer/issues/26#issuecomment-192080199 .
@fzliu I just had a question about how the "grad" in the master branch style_optfn gets used. From the code, I do not see any reference to the computed "grad"
@jiawei357
hmm, I am still confused here = =!
So, A gradient that is computed at your custom loss layer will travel through all the conv layers and finally reach the input image?
Hey, all:
I think I got the idea from reading the code in the gram layer branch. Thanks @jiawei357 @fzliu
@fzliu one last thing, where is the protobuf that has the gramianParameter defined? I couldn't locate it, and my caffe gives me error for not having it
You'll need a custom version of Caffe which contains the necessary layer definition: https://github.com/dpaiton/caffe/tree/gramian
got it thanks!
@fzliu I run the code on both branch, the results are not the same. The master branch produces reasonable results while the gram-layer branch provide really strange result
@fzliu after some digging, I think the problem is that the network is not using style loss at all. I can reproduce the above error when I turn off grad update from style layers. Any ideas?
@jiawei357 are you using the same gramian layer implementation from https://github.com/dpaiton/caffe/tree/gramian?
@fzliu More observations:
In the code, I did a validation on the output of the gramian layer in the modified caffe.
I think the output of the gramian layer is not correct. If I compare the output of the gramian layer and the output by simply computing the matrix multiplication of the convolution layer's result. The number do not match.
e.g.
in the network, there is a connection: conv1_1 -> conv1_1/gramian
the output of conv1_1/gramian should be the inner product of conv1_1's output. However the result do not match to the manual computation of conv1_1 using scipy.sgemm.
Am I the only one having problem with the Gramian layer?
Thanks,
Try this one instead: https://github.com/fzliu/caffe/tree/gram-layer. I don't quite remember how it's different, but I remember doing some minor changes to the original gram layer implementation. I'll look into merging it into dpaiton's branch soon.
@fzliu works this time. Thanks, I took a look at the layer implementation, but could not find obvious difference. I think the main issue might be how the pointers or data dimensions are tweaked.
I do have another question that I want to ask,
How do python decide when to copy by reference or by copy. I found several places in the code that you use .copy() while some other places that you use assignment. when we copy caffe's blob, do we need to use assignment or .copy()?
I found several places of these operations,
in 1 and 2, the assignment obviously has different function, while in 1, it is a reference, and in 2, it is a copy.
I thought I understand the python assignment and copy well, but found it hard to differentiate these situations....... = =! Please teach me,
Thanks,
Hi, everybody! Could somebody share mentioned above prototxt? Thanks!
@jiawei357 Could you please tell me your E-mail address? I'v also defined a custom layer using PyCaffe, but get some trouble when override 'backward()' function. Hope to seek some advice from you. Thanks!
Hi, is there any specific reason that you did the back propagation one layer at a time?