AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.57k stars 7.95k forks source link

Why different mAPs? #4473

Open LukeAI opened 4 years ago

LukeAI commented 4 years ago

I trained two versions of the same csresnext-spp-pan-sclae-iou. The difference was the first was trained for max_batches=20000, steps=14000,18000 and the last was trained on max_batches=15000 steps=7000,10000 and they were trained on slightly different versions of the repo.

As can be seen below, the first achieved +1mAP wrt. the other but it doesn't appear to be related to the fact that it trained for longer - it achieves 73.3 mAP by around 5000 batches wheras the other is only at 70mAP at 5000.

Am I correct in thinking that the learning rate schedule should have been identical between the two up to this point? Or does it decay down deterministically until the next step? The difference may be because of some subtle regression in a more recent repo. but I'd like to try to reproduce the earlier, superior result but ideally with fewer minibatches.

My training set is 13800 and I have 5 classes.

csresnextleaky-iou

chart

HagegeR commented 4 years ago

I think the seed is not set so it's just a random change that is not significant

AlexeyAB commented 4 years ago

Which repository gives the best accuracy the old or new? What date of both repos? There was a bug in data agumentation blurring 26 oct -> 4 dec. Also improvement in yolo-layer for ignore_thresh.

LukeAI commented 4 years ago

the older repo. gave the best accuracy. Unfortunately, I don't know which commit exactly, probably from around a week ago. the newer is from a couple of days ago. Currently retraining with the longer training schedule to see if I can reproduce.

LukeAI commented 4 years ago

Trained again on the repo from two days ago and it's even worse! 3mAP less than last week. I set steps here to 16000, 18000

Is this just random fluctuation? Or overfitting? Or a regression?

chart

AlexeyAB commented 4 years ago

@LukeAI

Try to train 3 times this model on the current repository, and 3 times this model on the old repository (on what date?) and compare the mAP.

Share your cfg-file.


Also

  1. Check that you are using absolutely identical cfg files, without a single change. (Also without mosaic, ciou, iou_thresh... )

  2. Check that you are using absolutely identical obj.data, train.txt, valid.txt files, without a single change.

  3. Check that you you are using the same pre-trained weights file.

  4. Try to train again and look, will be there mAP fluctuation?

  5. Try to train with [net] blur=5 in cfg-file

  6. If it doesn't help, then use if (best_iou > l.ignore_thresh) { instead of https://github.com/AlexeyAB/darknet/blob/dbe34d78658746fcfc9548ebab759895ea05a70c/src/yolo_layer.c#L353