the choice of loss function and learning rate

LiangXu123 commented 6 years ago

Hi, guys,have you ever try to find out where the original caffe's L1 loss function really work in pytorch?

part 1: In my experiment the following two loss function get absolutely different results: 1: loss_fn = torch.nn.L1Loss(size_average=False) bad result,using lr=1e-5,see part 2 next 2: loss_fn = torch.nn.SmoothL1Loss(size_average=True) relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss

part 2: and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like : name: "fc6-new" type: "InnerProduct" bottom: "pool5_concat" top: "fc6" param { lr_mult: 10 decay_mult: 1 } which means the indeed lr for fc layer is equal to : base_lr*lr_mult=1e-5. so the lr for fc layer should set to 1e-5 in our pytorch code.

so,as far as i can see,there are two problems remains: 1:should we use better loss function like SmoothL1Loss? 2:have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?

amoudgl commented 6 years ago

@cc786537662 thanks for your efforts.

In my experiment the following two loss function get absolutely different results: 1: loss_fn = torch.nn.L1Loss(size_average=False) bad result,using lr=1e-5,see part 2 next 2: loss_fn = torch.nn.SmoothL1Loss(size_average=True) relatively good result,using lr=5e-3,since the loss scale is roughly 1:100 compared with above L1 loss

I believe that the results are almost the same with both the loss functions. Smooth L1 loss torch.nn.SmoothL1Loss(size_average=True), reports per sample loss (on a batch of 50 samples) whereas torch.nn.L1Loss(size_average=False) reports the total loss on a batch of 50 samples. I get a per sample loss Smooth L1 loss of around 1.1 in the first few iterations of which is equivalent to l1 loss (~180) on a batch (it's just that loss is defined in a different way, that's why we are getting different number). I haven't tested in long-term training if this Smooth L1 loss loss would work or not but my priority would be to replicate the original GOTURN results using their exact formulation, if possible.

and there is one more thing,I think your learning rate is not right,since in the original GOTURN,the base_lr: 0.000001 is 1e-6 that's all right.but in the corresponding tracker.prototxt file,the learned fc layer has parameters like ...

Similar to original GOTURN project, learning rate is set differently for weights and biases here in train.py.

have you reproduce the original GOTURN result using this code? HOW? and what's the best learning rate schedule?

I trained the pytorch model for 280k iterations, I get a loss of around ~90-100 and it saturates there. Original GOTURN project gets loss of around ~50 in the same number of iterations although we follow the exact same batch formation procedure, learning rates, models etc. I am still trying to replicate the GOTURN results using pytorch. As of now, I found out that exp_lr_scheduler in train.py needs to be modified to handle different learning rates for weights and biases. Currently, after one step (i.e. 1e5 iterations), it sets learning rate to gamma*1e-6 for all the weights and biases which is incorrect, I believe. But still, loss at 1e5 iterations (~100) is more than the original GOTURN loss at 1e5 iterations (~50).

amoudgl commented 6 years ago

pygoturn training plot:

Original GOTURN training plot (credits: @sydney0zq):

LiangXu123 commented 6 years ago

great job man,thanks for your quick reply.my code is not the latest version so i don't know you have add different learning rate for weights and biases ,as for now, I am still working on reproducing the paper's result, and I am trying to keep everything the same as the paper said,after about 12 epoch of training,evaluation with vot toolkit,the result looks like:

LiangXu123 commented 6 years ago

rankingplot_baseline_mean tracker_legend

LiangXu123 commented 6 years ago

as you can see,there is a large margin between the paper GOTURN result with my experiment result marked as GOTURN_My. and now I got a little confuse about training iterations,the paper use 500000 iterations,with a batchsize of 50,while the training list is about 28W images pair,which means one epoch contain about 28W/50=5600 iterations,the total epoch = 500000/5600=90 epochs,that's really require a lot of time to train the network,even with my TITAN XP ,one epoch need about 2 hours,90 epoch need about 7.5 days to train,Is that correct? training set is the same as the paper used:ALOV300+Image DET

amoudgl commented 6 years ago

That sounds correct, it took 4 days for ~280,000 iterations on GeForce GTX 1080 Ti.

amoudgl / pygoturn

the choice of loss function and learning rate #15