AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

For YoloV4, should I early stop my training on custom dataset? #5638

Closed natures66 closed 4 years ago

natures66 commented 4 years ago

my system is win10 64bit, and my dataset has 5 classes, trainset has about 1000 images, 1920*1080 pixels, so I set max_batch=10000. but when I start to train,(command: darknet.exe detector train cfg/obj.data cfg/yolov4-obj.cfg backup/yolov4-obj_1000.weights –dont_show –mjpeg_port 8090) screenshot1_WPS图片1 total_loss is decreasing,but the map I think is decreasing too. because the best weight is not the last weight. screenshot1_WPS图片

that's my yolov4-obj.cfg `[net]

Testing

batch=1

subdivisions=1

Training

batch=32 subdivisions=16 width=608 height=608 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 10000 #classes*2000 policy=steps steps=8000,9000 # change line steps to 80% and 90% of max_batches scales=.1,.1

cutmix=1

mosaic=1

:104x104 54:52x52 85:26x26 104:13x13 for 416

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-7

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-10

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-28

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-28

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-16

[convolutional] batch_normalize=1 filters=1024 size=1 stride=1 pad=1 activation=mish stopbackward=800

##########################

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

SPP

[maxpool] stride=1 size=5

[route] layers=-2

[maxpool] stride=1 size=9

[route] layers=-4

[maxpool] stride=1 size=13

[route] layers=-1,-3,-5,-6

End SPP

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = 85

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = 54

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

##########################

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[yolo] mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=5 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 scale_x_y = 1.2 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 max_delta=5

[route] layers = -4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=256 activation=leaky

[route] layers = -1, -16

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[yolo] mask = 3,4,5 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=5 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 scale_x_y = 1.1 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 max_delta=5

[route] layers = -4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=512 activation=leaky

[route] layers = -1, -37

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=30 activation=linear

[yolo] mask = 6,7,8 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=5 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 scale_x_y = 1.05 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 max_delta=5

`

should I stop training now? plz help me ...

aicukltd commented 4 years ago

If you run training with the -map command, what is your mAP? if you ran it for another 10000 iterations what is the delta between the mAP when you started at 10000 and 20000.

My interpretation of the mAP concept within Darknet and YOLO is to stop training when you have the lowest AVG LOSS and highest mAP.

If you train with the -map command, you will get a weights file named {datasetName}_best.weights, this file has the highest mAP and is probably your starting place for testing!

natures66 commented 4 years ago

If you run training with the -map command, what is your mAP? if you ran it for another 10000 iterations what is the delta between the mAP when you started at 10000 and 20000.

My interpretation of the mAP concept within Darknet and YOLO is to stop training when you have the lowest AVG LOSS and highest mAP.

If you train with the -map command, you will get a weights file named {datasetName}_best.weights, this file has the highest mAP and is probably your starting place for testing!

Thanks for your reply. I trained with the -map command at first,but the computer rebooted...I continue to train with darknet.exe detector train cfg/obj.data cfg/yolov4-obj.cfg backup/yolov4-obj_1000.weights –dont_show –mjpeg_port 8090 command without -map. So the {datasetName}_best.weights will not be updated without -map command,right? And I use this command darknet.exe detector map cfg/obj.data cfg/yolov4-obj.cfg backup/yolov4-obj_6000.weights to get map for weights. If when the avg loss is not the lowest, should I choose the highest map? I'm new at this,thank you again for your patience :)

aicukltd commented 4 years ago

Okay, so I think your '_best.weights' will only reflect the best mAP up until restart.

I would continue your training from the latest weights with the -map command and then 'weight' ;) until your mAP hits its peak and your avg_loss is at its lowest!

My interpretation of the -map command is that it allows you to make decisions as to when you stop training during the process and theoretically the '_best.weights' file will contain your most accurate weights.

Anyone with more knowledge please correct me if I am wrong!

natures66 commented 4 years ago

Actually my map is not increasing all the time.I stop the training at 6000 iterations and get six maps.For iteration 1k,2k,3k,4k,5k,6k, I got map 27.08%, 36.34%, 35.08%, 32.41%, 36.00% and 33.11%. That's what bothers me...If I continue my training but the map is decreasing. I should choose the '_best.weights' (highest map weights) to test, right?

aicukltd commented 4 years ago

I believe that is correct yes.

Have you checked your image bounding boxes with -show_imgs command? How many images do you have in your dataset?

natures66 commented 4 years ago

No,I have not used -show_imgs command, is it for darknet.exe detector train xxxxxx command? I have 1100 images as my train set but there are only about 700 images have objects, and my val set has more than 300 images. The images are all 1920*1080, do you think they are too large? Do I need to crop them?

aicukltd commented 4 years ago

The -show_imgs command will select a random set of images from your dataset and draw the labels on the objects using the bounding box format in the .txt file.

If the bounding boxes do not cover your objects then that will indicate an incorrectly labelled dataset.

If you append -show_imgs to the command you are using to train your model then you will see the above behaviour.

Apparently training images of your resolution shouldn't be a problem. IF you resized your images you would need to resize your labels as well.

natures66 commented 4 years ago

Okay, thx! I will try :)

AlexeyAB commented 4 years ago

Apparently training images of your resolution shouldn't be a problem. IF you resized your images you would need to resize your labels as well.

If you resize image - you should not resize labels, because they are relative.

You should change labels only if you crop the image.

aicukltd commented 4 years ago

Apologies, what I meant was if you are calculating using the image width and height as explained here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

If we are using the image width and height of the unscaled or un-resized would it not change the relativity of the label?

So:

img = dimensions class x y w h

img_a.jpg = 2048x2048 bird 100 100 50 50

img_a_resize.jpg = 1024x1024 bird 50 50 25 25

Is that logic incorrect?

AlexeyAB commented 4 years ago

x,y,w,h - are always from 0.0 to 1.0

https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Where: - integer object number from 0 to (classes-1) - float values relative to width and height of image, it can be equal from (0.0 to 1.0] for example: = / or = / atention: - are center of rectangle (are not top-left corner) For example for img1.jpg you will be created img1.txt containing: 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667
aicukltd commented 4 years ago

Yes I was just writing it in terms of ease, I understand that it needs to be in 0.0-1.0 region but is that not itself calculated from the Width / Height of the image in the first place?

for example: = / or = /

Surely the values would change of the YOLO format 0 0.034 0.011 0.676 0.123 to reflect the new Width / Height?

AlexeyAB commented 4 years ago

Surely the values would change of the YOLO format 0 0.034 0.011 0.676 0.123 to reflect the new Width / Height?

Give an example of how they will change with numbers 0.0 - 1.0

aicukltd commented 4 years ago

Give an example of how they will change with numbers 0.0 - 1.0

So I was working on the assumption that people may be resizing or generating a different format of labels prior to converting them to YOLO format. For example, my dataset was generated in an online tool, exported in COCO format and ran through an in-house converter that generated a YOLO dataset. I think my assumption was based on the fact that I resize my images automatically and in the (class x y w h) format before converting to the YOLO format.

It makes more sense that as long as you keep the scale of the image the same then the YOLO label will always remain correct.

AlexeyAB commented 4 years ago

Yes.