Closed kloudkl closed 10 years ago
@jeffdonahue has an improved backward interface in the works. Jeff, how about adding an optional repeated field for the back propagation flags? Does that fit neatly into your new init logic that determines the vector of propagation flags?
Le lundi 5 mai 2014, kloudkl notifications@github.com a écrit :
The Google video classification CNN explored four transfer learning methods training from scratch, fine-tuning top layer(classifier), fine-tuning top 3 layers, and fine-tuning all layers [1]. Fine-tuning specific layers keeps the generic features of the other layers untouched during training. They found that fine-tuning top 3 layers performed best.
It is not very straightforward to reason about whether the backward propagation of a layer is disabled in Caffe as shown in #100https://github.com/BVLC/caffe/issues/100and
103 https://github.com/BVLC/caffe/pull/103. So it would be nice to be
able explicitly disable it.
[1] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei. Large-Scale Video Classification with Convolutional Neural Networks. CVPR 2014.
— Reply to this email directly or view it on GitHubhttps://github.com/BVLC/caffe/issues/389 .
Evan Shelhamer
I believe that you can already do this in Caffe by setting blobs_lr: 0.0
in all layers you won't finetune (need two of that line if the layer has biases), and then their backward passes won't be computed, unless you have layers under them with non-zero blobs_lr. I could another bool parameter to LayerParameter
called something like force_no_backward
as well, but I'm not sure how to handle the case of a force_no_backward
layer having weights (with blobs_lr>0) below it.
Right. What I'm suggesting is a field not for weight blobs but for bottoms to act as a vector of flags, one per bottom, to dictate whether backpropagation should continue to that bottom.
If it overcomplicates the logic we can leave it as an issue for now.
Closing since this is already supported by blobs_lr
.
If blobs_lr
is set to 0, does that actually prevent the partial derivatives from being computed? If the GPU is computing them but then updating the weights by 0, it seems like a very hacky and expensive way to go about....
It does prevent all the unnecessary computation. It's not a hack at all. This is just how we signify that further backpropagation is unnecessary in our model definitions. If you inspect the output during model construction you will see Caffe decide where to backpropagate and not.
See Net::Init()
for the details:
https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp#L32-L171
On Wed, Aug 13, 2014 at 9:26 AM, Alexandre Dalyac notifications@github.com wrote:
If blobs_lr is set to 0, does that actually prevent the partial derivatives from being computed? If the GPU is computing them but then updating the weights by 0, it seems like a very hacky and expensive way to go about....
— Reply to this email directly or view it on GitHub https://github.com/BVLC/caffe/issues/389#issuecomment-52073006.
ah ok, sorry guys. nice job on keeping the UI simple then!
The Google video classification CNN explored four transfer learning methods training from scratch, fine-tuning top layer (classifier), fine-tuning top 3 layers, and fine-tuning all layers [1]. Fine-tuning specific layers keeps the generic features of the other layers untouched during training. They found that fine-tuning top 3 layers performed best.
It is not very straightforward to reason about whether the backward propagation of a layer is disabled or not in Caffe as shown in #100 and #103. So it would be nice to be able explicitly disable that.
[1] Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, Li Fei-Fei. Large-Scale Video Classification with Convolutional Neural Networks. CVPR 2014.