Reproducing paper results - second training stage loss reaches a noisy plateau very early

MIV-XJTU / ARTrack

Apache License 2.0

228 stars 33 forks source link

Reproducing paper results - second training stage loss reaches a noisy plateau very early #24

Closed OmerRe closed 1 year ago

OmerRe commented 1 year ago

Hi, We are trying to reproduce your results. During the training of the second stage, we observe unusual behavior in both loss components, which become noisy and flat after a few epochs. In addition, we are unable to reach the results you published in the paper. For example, following exactly the same settings you used for training, ARTrack256 on LaSOT demonstrated an AUC and precision that were 1.3% and 1.4% lower, respectively, compared to the results you reported.

Did you encounter a similar loss behavior during your training?

I attached the graphs showing the losses we observed in the second training stage.

AlexDotHam commented 1 year ago

I think this is abnormal. You can send me an email and I can provide you with the corresponding log files generated during my training. In stage two, my giou loss finally converged to around 0.14 on average. The loss you provided seems to differ greatly from mine. I think your stage one training does not seem to be correct, or your configuration is quite different from mine. This situation often occurs when you use more cards for training. I hope you can provide me with more of your training configuration to help me resolve your issue.

AlexDotHam commented 1 year ago

I think you can try increasing the training in stage one, such as changing the training in stage one to 300 epochs and reducing the learning rate to 240 epochs, which can effectively improve training stability. Then in stage two training, only train for 25-30 or 60 epochs (you may need some trial and error).

Baogerock commented 1 year ago

In our training, the accuracy is only 0.2% lower or higher than the paper when i training third time. I guess it's because the model training is very unstable, and it may take a long time to retry to achieve the effect. My experience is that you can consider testing several epochs more. I have found that the accuracy of the second stage training fluctuates greatly, so it requires extensive experimentation to achieve good results.

OmerRe commented 1 year ago

I think this is abnormal. You can send me an email and I can provide you with the corresponding log files generated during my training. In stage two, my giou loss finally converged to around 0.14 on average. The loss you provided seems to differ greatly from mine. I think your stage one training does not seem to be correct, or your configuration is quite different from mine. This situation often occurs when you use more cards for training. I hope you can provide me with more of your training configuration to help me resolve your issue.

Thanks for your response. I've emailed you my training configuration, I would appreciate it if you could provide me with yours.