Closed taosean closed 4 years ago
Hi @taosean , one thing I noticed is that your training schedule is roughly 2x longer (effectively) than ours. That is, according to linear scaling rule, since you're using 2x batch size and 2x lr, the corresponding training schedule should be 0.5x. Namely, STEP_SIZES: [50000, 10000, 10000], and MAX_ITER: 70000. Maybe it worths trying to see if it leads to a different result.
Looking at the training curves you provided, it seems that blue curve at 4000 iteration roughly achieves a similar loss of the orange curve at 8000 iteration. This seems reasonable. (I suggest scaling x-axis of the blue curve by 2 for clearer comparison, since it effectively trains 2x faster due to the increased BS and LR. )
Thank you @chaoyuaw , I tried, it does help! Looks like it was due to overfitting. Thank you very much.
Hi,I have a confusion about the experiment result I got.
The thing is, I trained a new detector using the detection model you provided, only with larger batchsize. We compared our detector with the model this repo provided and found we got higher mAP and mAR than provided model on validation set. For keyframes in validation set which have no boxes labeled (referred to as BG images), we detected less boxes than provided model. So from the perspective of detector's metrics, I think we got a better detector.
Using that detector trained by ourself, we followed the steps you described in paper, which are
0.6
(in paper it is0.9
, but we got much less records than the providedava_train_predicted_boxes.csv
with 0.9, and we found 0.6 got similar number of records).ava_r101_baseline
). The training is configured withbatchsize=32
(16
in provided model),initial learning rate=0.08
(0.04 in provided model),STEP_SIZES:[110000, 20000, 10000]
, other parameters are kept as they are.We compared our training loss with the one provided by
102760714.log
, the following is the training lossAs you can see, our training loss is lower than the provided model.
However, we evaluated our model on validation set, the mAP we got is not as good as the provided model.
I cannot explain the result well, is this because of overfitting? Or the differences of distribution between training set and validation set? Any tips on training the detector, baseline model and LFB model?
Could you share your insights if you have any?
Best regards.