why not use lr_mult, decay_mult like {1, 1, 2, 0}?

forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters

BSD 2-Clause "Simplified" License

2.17k stars 723 forks source link

why not use lr_mult, decay_mult like {1, 1, 2, 0}? #60

Open ujsyehao opened 5 years ago

ujsyehao commented 5 years ago

In alexnet network, it uses lr_mult/decay_mult param {1, 1, 2, 0}, In squeezenet, it doesn't set param, so caffe uses its default value, lr_mult and decay_mult is default set to 1. so its param {1, 1, 1, 1} As we all know, we should not add weight decay to bias. So why you use default lr_mult and decay_mult?

ujsyehao commented 5 years ago

I do ablation experiments, the results verify if set param {1,1,2,0}, it will have higher accuracy.

forresti commented 5 years ago

@ujsyehao That's interesting. How much did the accuracy improve with your new setting of lr_mult?

ujsyehao commented 5 years ago

about 0.2% - 0.5% higher.