Roll920 / ThiNet

caffe model of ICCV'17 paper - ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression https://arxiv.org/abs/1707.06342
144 stars 40 forks source link

About the weight decay #8

Open hungsing92 opened 6 years ago

hungsing92 commented 6 years ago

Hi,

想请问一下你训练时的weight_decay参数是多少? 我用weight_decay=0.0005 finetune做检测任务的时候, 网络很难收敛.

Best!

Roll920 commented 6 years ago

@hungsing92 We set the batch size to 8 in training, and fine tune the model with 10^-3 learning rate for the first 120k iterations, then 10^-4 and 10^-5 for next two 30k iterations, respectively. We use SGD to optimize the training process where the weight decay is set to 0.0005, momentum is 0.9 and gamma is 0.1.