Open LukeAI opened 4 years ago
I think the seed is not set so it's just a random change that is not significant
Which repository gives the best accuracy the old or new? What date of both repos? There was a bug in data agumentation blurring 26 oct -> 4 dec. Also improvement in yolo-layer for ignore_thresh.
the older repo. gave the best accuracy. Unfortunately, I don't know which commit exactly, probably from around a week ago. the newer is from a couple of days ago. Currently retraining with the longer training schedule to see if I can reproduce.
Trained again on the repo from two days ago and it's even worse! 3mAP less than last week. I set steps here to 16000, 18000
Is this just random fluctuation? Or overfitting? Or a regression?
@LukeAI
Try to train 3 times this model on the current repository, and 3 times this model on the old repository (on what date?) and compare the mAP.
Share your cfg-file.
Also
Check that you are using absolutely identical cfg files, without a single change. (Also without mosaic, ciou, iou_thresh... )
Check that you are using absolutely identical obj.data, train.txt, valid.txt files, without a single change.
Check that you you are using the same pre-trained weights file.
Try to train again and look, will be there mAP fluctuation?
Try to train with [net] blur=5
in cfg-file
If it doesn't help, then use if (best_iou > l.ignore_thresh) {
instead of https://github.com/AlexeyAB/darknet/blob/dbe34d78658746fcfc9548ebab759895ea05a70c/src/yolo_layer.c#L353
I trained two versions of the same csresnext-spp-pan-sclae-iou. The difference was the first was trained for
max_batches=20000, steps=14000,18000
and the last was trained onmax_batches=15000 steps=7000,10000
and they were trained on slightly different versions of the repo.As can be seen below, the first achieved +1mAP wrt. the other but it doesn't appear to be related to the fact that it trained for longer - it achieves 73.3 mAP by around 5000 batches wheras the other is only at 70mAP at 5000.
Am I correct in thinking that the learning rate schedule should have been identical between the two up to this point? Or does it decay down deterministically until the next step? The difference may be because of some subtle regression in a more recent repo. but I'd like to try to reproduce the earlier, superior result but ideally with fewer minibatches.
My training set is 13800 and I have 5 classes.