Weight decay - Githubissues

ChristianEschen commented 7 years ago

Hello

I am a little confused about the use of regularization in the network. In the solver you define weight decay =0.0005

    with open("solver.prototxt", 'w') as f:
        f.write("net: \"" + self.params['ModelParams']['prototxtTrain'] + "\" \n")
        f.write("base_lr: " + str(self.params['ModelParams']['baseLR']) + " \n")
        f.write("momentum: 0.99 \n")
        f.write("weight_decay: 0.0005 \n")
        f.write("lr_policy: \"step\" \n")
        f.write("stepsize: 20000 \n")
        f.write("gamma: 0.1 \n")
        f.write("display: 1 \n")
        f.write("snapshot: 500 \n")
        f.write("snapshot_prefix: \"" + self.params['ModelParams']['dirSnapshots'] + "\" \n")

But in the individual network you use layer { name: "conv_in128_chan16" type: "Convolution" bottom: "data" top: "conv_in128_chan16" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0

So weight decay for the individual layers are 1. I don't understand the different between the weight decay input to the optimizer and tthe weight decay specified in the individual layer. I hope you can clarify it for me.

Furthermore, you specify learning rate and decay raye for the biases in the latter part of the above layer definition. As far as I understand the bias is not important for conv-nets since it just add a constant to the output. If this is correct, why do you specify it?

faustomilletari commented 7 years ago

This is caffe specific.

Weight decay in the configuration is the weight of weight decay component during optimization.

Weight decay multiplier in caffèprototxt is the impact of weight decay specified in configuration on that specific layer.

Fausto Milletarì Sent from my iPhone

On 25. May 2017, at 09:16, ChristianEschen notifications@github.com wrote:

Hello

I am a little confused about the use of regularization in the network. In the solver you define weight decay =0.0005
with open("solver.prototxt", 'w') as f:
    f.write("net: \"" + self.params['ModelParams']['prototxtTrain'] + "\" \n")
    f.write("base_lr: " + str(self.params['ModelParams']['baseLR']) + " \n")
    f.write("momentum: 0.99 \n")
    f.write("weight_decay: 0.0005 \n")
    f.write("lr_policy: \"step\" \n")
    f.write("stepsize: 20000 \n")
    f.write("gamma: 0.1 \n")
    f.write("display: 1 \n")
    f.write("snapshot: 500 \n")
    f.write("snapshot_prefix: \"" + self.params['ModelParams']['dirSnapshots'] + "\" \n")
But in the individual network you use layer { name: "conv_in128_chan16" type: "Convolution" bottom: "data" top: "conv_in128_chan16" param { lr_mult: 1.0 decay_mult: 1.0 } param { lr_mult: 2.0 decay_mult: 0.0

So weight decay for the individual layers are 1. I don't understand the different between the weight decay input to the optimizer and tthe weight decay specified in the individual layer. I hope you can clarify it for me.

Furthermore, you specify learning rate and decay raye for the biases in the latter part of the above layer definition. As far as I understand the bias is not important for conv-nets since it just add a constant to the output. If this is correct, why do you specify it?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ChristianEschen commented 7 years ago

Thank you!

faustomilletari / VNet

Weight decay #40