AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.66k stars 7.96k forks source link

Training Custom Dataset Procedure #5180

Open yucedagonurcan opened 4 years ago

yucedagonurcan commented 4 years ago

Hello everyone, I am trying to train a custom dataset using fine-tuning method but I need to ask the procedure more clearly. I already read #2147, #3719, #2139, #4585.

./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81

Note: I need to mention that for proof of concept I am using 2084 images and use them train and validation. I know it can lead to results like this but I want to be sure about the procedure. Have a great day.

AlexeyAB commented 4 years ago

Your procedure is correct. What mAP do you get?

yolov3.weights is trained on 120 000 images. The more training images - the higher accuracy you can achieve

yucedagonurcan commented 4 years ago

Here is my output for: ./darknet detector map build/darknet/x64/data/custom.data cfg/custom.cfg build/darknet/x64/backup/Mini_6_Classes/custom_best.weights

CUDA-version: 10000 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1 OpenCV version: 4.2.0 compute_capability = 750, cudnn_half = 1 net.optimized_memory = 0 mini_batch = 1, batch = 16, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 33 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 33 0.011 BF 82 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 33 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 33 0.023 BF 94 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 33 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 33 0.046 BF 106 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou Total BFLOPS 65.341 avg_outputs = 517718 Allocate additional workspace_size = 52.43 MB Loading weights from build/darknet/x64/backup/Mini_6_Classes/custom_best.weights... seen 64, trained: 455 K-images (7 Kilo-batches_64) Done! Loaded 107 layers from weights-file

calculation mAP (mean average precision)... 2084 detections_count = 53214, unique_truth_count = 17712
class_id = 0, name = person, ap = 65.84% (TP = 3894, FP = 1816) class_id = 1, name = bicycle, ap = 89.47% (TP = 243, FP = 84) class_id = 2, name = bus, ap = 98.36% (TP = 524, FP = 98) class_id = 3, name = car, ap = 81.98% (TP = 6910, FP = 786) class_id = 4, name = motorcycle, ap = 87.25% (TP = 470, FP = 84) class_id = 5, name = truck, ap = 96.21% (TP = 937, FP = 246)

for conf_thresh = 0.25, precision = 0.81, recall = 0.73, F1-score = 0.77 for conf_thresh = 0.25, TP = 12978, FP = 3114, FN = 4734, average IoU = 61.64 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.865188, or 86.52 % Total Detection Time: 236 Seconds

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

AlexeyAB commented 4 years ago

86.52 % is a good result.

Because I am experiencing very bad results(in 7k iteration) compared to YOLOv3 weights on webcam and images.

May be this is because your training dataset doesn't correspond to your test images: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

yucedagonurcan commented 4 years ago
yucedagonurcan commented 4 years ago

@AlexeyAB I trained another model with 20k training images and 2k validation with 4 classes. Here is the results at the end of the 12000th iteration: What do you think do I need to train with more data or there is something wrong that you can point out? image

12000: 133.837112, 82.316750 avg loss, 0.000040 rate, 13.630626 seconds, 3072000 images, 0.878401 hours left Resizing to initial size: 608 x 608 try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done!

calculation mAP (mean average precision)... 2000 detections_count = 169310, unique_truth_count = 13061
class_id = 0, name = person, ap = 43.98% (TP = 1408, FP = 1866) class_id = 1, name = bicycle, ap = 46.49% (TP = 117, FP = 877) class_id = 2, name = car, ap = 63.32% (TP = 5033, FP = 1148) class_id = 3, name = motorbike, ap = 34.83% (TP = 94, FP = 809)

for conf_thresh = 0.25, precision = 0.59, recall = 0.51, F1-score = 0.54 for conf_thresh = 0.25, TP = 6652, FP = 4700, FN = 6409, average IoU = 43.97 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.471557, or 47.16 % Total Detection Time: 64 Seconds

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

mean_average_precision (mAP@0.5) = 0.471557 New best mAP! Saving weights to build/darknet/x64/backup/custom_best.weights MJPG_sender: new client 78 known client: 79, sent = -1, must be sent outlen = 183309 MJPG_sender: kill client 79 Close socket: out = -1, in = -1 MJPEG-stream sent. Saving weights to build/darknet/x64/backup/custom_12000.weights Saving weights to build/darknet/x64/backup/custom_last.weights Saving weights to build/darknet/x64/backup/custom_final.weights OpenCV exception: destroy_all_windows_cv MJPG_sender: new client 79 Close socket: out = 0, in = 0 MJPG_sender: close clinet: 0 Close socket: out = 0, in = 0 MJPG_sender: close acceptor: 0

AlexeyAB commented 4 years ago

I would recommend you to train 20 000 iterations. Or use yolov3-spp.cfg instead of yolov3-tiny.cfg

yucedagonurcan commented 4 years ago

Okey, I did another 12k iterations: But it still very bad on webcam. It labels a person as a car 20% of the time. Is is a false expectation from a model to be robust on webcam as YOLO ?

(next mAP calculation at 24518 iterations) Last accuracy mAP@0.5 = 55.70 %, best = 55.70 % 24000: 82.994461, 65.949615 avg loss, 0.000040 rate, 12.988230 seconds, 6144000 images, 0.899318 hours left Resizing to initial size: 608 x 608 try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done!

calculation mAP (mean average precision)... 2000 detections_count = 124391, unique_truth_count = 13061
class_id = 0, name = person, ap = 48.13% (TP = 1487, FP = 1528) class_id = 1, name = bicycle, ap = 56.49% (TP = 120, FP = 496) class_id = 2, name = car, ap = 66.41% (TP = 5324, FP = 1109) class_id = 3, name = motorbike, ap = 51.13% (TP = 100, FP = 463)

for conf_thresh = 0.25, precision = 0.66, recall = 0.54, F1-score = 0.59 for conf_thresh = 0.25, TP = 7031, FP = 3596, FN = 6030, average IoU = 49.99 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.555401, or 55.54 % Total Detection Time: 63 Seconds

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

mean_average_precision (mAP@0.5) = 0.555400 MJPEG-stream sent. Saving weights to build/darknet/x64/backup/custom_24000.weights Saving weights to build/darknet/x64/backup/custom_last.weights Saving weights to build/darknet/x64/backup/custom_final.weights OpenCV exception: destroy_all_windows_cv

yucedagonurcan commented 4 years ago

Or use yolov3-spp.cfg instead of yolov3-tiny.cfg

I am using yolov3.cfg not tiny but will try spp as well.

duongdqq commented 4 years ago

Here is my output for: ./darknet detector map build/darknet/x64/data/custom.data cfg/custom.cfg build/darknet/x64/backup/Mini_6_Classes/custom_best.weights

CUDA-version: 10000 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.2.0 compute_capability = 750, cudnn_half = 1 net.optimized_memory = 0 mini_batch = 1, batch = 16, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 33 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 33 0.011 BF 82 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 33 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 33 0.023 BF 94 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 33 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 33 0.046 BF 106 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou Total BFLOPS 65.341 avg_outputs = 517718 Allocate additional workspace_size = 52.43 MB Loading weights from build/darknet/x64/backup/Mini_6_Classes/custom_best.weights... seen 64, trained: 455 K-images (7 Kilo-batches_64) Done! Loaded 107 layers from weights-file calculation mAP (mean average precision)... 2084 detections_count = 53214, unique_truth_count = 17712 class_id = 0, name = person, ap = 65.84% (TP = 3894, FP = 1816) class_id = 1, name = bicycle, ap = 89.47% (TP = 243, FP = 84) class_id = 2, name = bus, ap = 98.36% (TP = 524, FP = 98) class_id = 3, name = car, ap = 81.98% (TP = 6910, FP = 786) class_id = 4, name = motorcycle, ap = 87.25% (TP = 470, FP = 84) class_id = 5, name = truck, ap = 96.21% (TP = 937, FP = 246) for conf_thresh = 0.25, precision = 0.81, recall = 0.73, F1-score = 0.77 for conf_thresh = 0.25, TP = 12978, FP = 3114, FN = 4734, average IoU = 61.64 % IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.865188, or 86.52 % Total Detection Time: 236 Seconds Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

Hi @yucedagonurcan Can I ask you a couple questions? I am also dealing with the project which is similar to your project? Here is my situation: My target task is to detect person, airplane and ship ( different from boat in MSCOCO). I set up file .names as the order 0 ship 1 person 2 airplane I shrink yolov3.weight to yolov3.conv.81 as start point In my dataset, my training dataset includes the images of ship, airplane and nearly 5% percent of person. Can I ask you:

  1. You remain person class in file .names so do you train your model on dataset which includes person images? And are those person image from MS COCO or your own person dataset?
  2. Do the order of class in file .names affect to our performance?
AlexeyAB commented 4 years ago

You remain person class in file .names so do you train your model on dataset which includes person images?

Yes.

And are those person image from MS COCO or your own person dataset?

You can use any images.

Do the order of class in file .names affect to our performance?

No.

yucedagonurcan commented 4 years ago

@duongdqq , @AlexeyAB already answered the questions but I want to add to my case:

And are those person image from MS COCO or your own person dataset?

No it was my personal dataset with road cam images.

Do the order of class in file .names affect to our performance? I actually tried this and it is not affecting the performance since you are cutting the last layer which is mapping detection to the classes. Which helps a lot is that increasing the

If you really want to achieve a good performance I can suggest you to clear the yolov3.weights (with -clear) and retrain with a low learning rate with your classes and ignore other classes with dont_show (in names file). This seems to work and I am working on it now.

duongdqq commented 4 years ago

@yucedagonurcan thank you for your suggestion. I do really appreciate that. In your recommendation, there are some things I dont understand

No it was my personal dataset with road cam images.

So in my case, I want to use the pre trained model to detect person without training again person class. Is that possible to utilize that? And what I have to get it? If you really want to achieve a good performance I can suggest you to clear the yolov3.weights (with -clear) and retrain with a low learning rate with your classes and ignore other classes with dont_show (in names file). This seems to work and I am working on it now.

When I add dont_show in names file. How do it work? and why do NOT I delete the unnecessary class names and just use only class names I need instead of add dont_show?

yucedagonurcan commented 4 years ago

Hello @duongdqq,

So in my case, I want to use the pre trained model to detect person without training again person class. Is that possible to utilize that? And what I have to get it?

  • So for this case (if you don't want to train) you can simply use yolov3.weights with coco.names. Maybe I don't understand it can you collaborate on that?

When I add dont_show in names file. How do it work?

  • When you add dont_show on the class names, darknet will ignore the prediction so when you don't want a class to be in the output (e.g aeroplane) you can add that.

and why do NOT I delete the unnecessary class names and just use only class names I need instead of add dont_show?

  • Okay, if you do that you will need to change the output layer. When you change the output layer, you need to retrain the network. That's why.
duongdqq commented 4 years ago

Hello @yucedagonurcan ,

So for this case (if you don't want to train) you can simply use yolov3.weights with coco.names. Maybe I don't understand it can you collaborate on that?

I mean I do want to train because I have some more type of classes Actually, I want to detect 4 objects ( person, ship, airplane and helicopter). I intend to set up like that: When you add dont_show on the class names, darknet will ignore the prediction so when you don't want a class to be in the output (e.g aeroplane) you can add that.

  • When I remain person class, add dont_show at unnecessary classes and ADD 3 MORE my own class, so totally I have 83 classes. That means I have to configure the model again to fit 83 classes right?
  • I try to take yolov3.conv.81 as my start point, build dataset without person images and receive very bad AP for person ( 30%) (1)
  • Do you think remain person class ( in .names file), add 3 more classes to dataset ( no person) and add dont_show to unnecessary classes and train again with classes = 83, I will receive a better score for person than (1) To be honest @yucedagonurcan
yucedagonurcan commented 4 years ago

When I remain person class, add dont_show at unnecessary classes and ADD 3 MORE my own class, so totally I have 83 classes. That means I have to configure the model again to fit 83 classes right?

I try to take yolov3.conv.81 as my start point, build dataset without person images and receive very bad AP for person ( 30%) (1)

Do you think remain person class ( in .names file), add 3 more classes to dataset ( no person) and add dont_show to unnecessary classes and train again with classes = 83, I will receive a better score for person than (1)

How many images do you have? How many iterations did you do? Can you train yolov3.conv.81(with your original dataset e.g person, ship...) at least 8k iterations and check the results?

duongdqq commented 4 years ago

How many images do you have? How many iterations did you do? Can you train yolov3.conv.81(with your original dataset e.g person, ship...) at least 8k iterations and check the results?

  • I just get person images from COCO dataset. So I have 60k person, 4k ship, 2k airplane and 1k helicopter (I). And I use yolov3.conv.81 as the starting point. Set up model with 4 classes. one issue says that should train with 4 epochs in which 1 epoch = total images/4. So that total 4 epochs = 4000 iteration, but for sure, I train with 30 000 iterations.
  • I want to train from scratch but I cut 81 narrow layers, and add person images which trained in yolov3.conv.81. Do you think I am wrong

But if you keep the classification layer's weights also, (I guess it was 105th layer, before last output layer) I believe it will work better.

Can I take yolov3.conv.105 as my start point for my above dataset (I) instead of yolov3.conv.81 to get better results Massive thanks @yucedagonurcan

yucedagonurcan commented 4 years ago

Okay I will inform you with my insights:

  1. Definitely add empty labels ( images with no object of interests), more is better. Select them with different angles, themes for data. (For example COCO dataset is so different then KITTI, that is what I mean by saying different angles and theme)
  2. If your dataset has a certain usage or more or less sizes of objects ( in my case, I wanted to use the model on road data, so I have certain boundary box sizes). I used 10 as number of clusters in KMeans (pay attention to changes you need to make in cfg file)
  3. Used yolov3.conv.81 with 50K training images with 24K iterations(I have 4 classes).

I want to train from scratch but I cut 81 narrow layers, and add person images which trained in yolov3.conv.81. Do you think I am wrong

Can I take yolov3.conv.105 as my start point for my above dataset (I) instead of yolov3.conv.81 to get better results

Note: Also, can you share your results so we can point out the problems?

hbiserinska commented 4 years ago

Hello @AlexeyAB and everyone Please help me out to find what might be improved.

Problem: Many False Negatives for small object custom detection

chart (1)

Last accuracy mAP@0.5 = 22.18 %, best = 22.18 % 6000: 0.077829, 0.103566 avg loss, 0.001000 rate, 20.334336 seconds, 384000 images, 25.264007 hours left

calculation mAP (mean average precision)...

detections_count = 309, unique_truth_count = 177
rank = 0 of ranks = 309 rank = 100 of ranks = 309 rank = 200 of ranks = 309 rank = 300 of ranks = 309 class_id = 0, name = nonnodule, ap = 0.00% (TP = 0, FP = 0) class_id = 1, name = nodule, ap = 43.14% (TP = 82, FP = 52)

for conf_thresh = 0.25, precision = 0.61, recall = 0.46, F1-score = 0.53 for conf_thresh = 0.25, TP = 82, FP = 52, FN = 95, average IoU = 41.58 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.215685, or 21.57 %

General Info: repository: AlexeyAB/darknet weights: darknet53.conv.74 cfg: yolov3.cfg

saturation = 1.0 # because I have 3 grayscale images stacked together to represent RGB

exposure = 1.5

hue=0

learning_rate=0.001 burn_in=1000 max_batches = 10000 policy=steps steps=8000,9000 scales=.1,.1

[yolo] mask = 10 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[yolo] mask = 9 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41, 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

[route] layers = -1, 11

[yolo] mask = 0,1,2,3,4,5,6,7,8 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41, 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=0 # in order to decrease the running time

the last two anchors - (30,30 and 60, 60) were not suggested by _calcanchors, but from you in one of the issues. 2. Should I remove the last two anchors? Is the current anchors set up correct or I should move 39,41 in the second yolo layer?

Please suggest what can be changed in order to improve the results. 3. Should I invest more time in image pre-processing. How important is that for Yolo?

AlexeyAB commented 4 years ago

@hbiserinska

class_id = 0, name = nonnodule, ap = 0.00% (TP = 0, FP = 0) class_id = 1, name = nodule, ap = 43.14% (TP = 82, FP = 52)

  1. Actually AP=43.14% rather than ~20%, because for some reason you added a class nonnodule, but did not mark it in the training and validation dataset.

  2. What is your average image size?

  3. How does size and aspect ratio vary in your training images?

  4. What command do you use for training?

  5. Try to train yolov4 (set your anchors, classes, steps and max_batches) using this pre-trained weight https://drive.google.com/open?id=1JKF-bdIklxOOVy-2Cr5qdvjgGpmGfcbp

    and do these changes: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L895 set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L892 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989

hbiserinska commented 4 years ago

@AlexeyAB , Yolov4 is an outstanding work. I have been struggling with precision around 40% for the last 2 months. Now, I trained with the positive class only (nodule) on yolov3 and yolov4 following your instructions and weights from the comment above. Both - yolo v3 and v4 were trained on the same dataset, with the same anchors. All images are the same size (512, 512, 3). !./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 512 -height 512 -show anchors = 1, 1, 4, 4, 5, 7, 6, 7, 9, 7, 8, 10, 12, 11, 16, 17, 30, 26

YoLov3 cfg: yolo-obj.txt subdivisions=16 width=512 height=512 ! ./darknet detector train data/obj.data cfg/yolo-obj.cfg cfg/darknet53.conv.74 -dont_show -map

yolov3

YoLov4 cfg: yolov4-custom.txt subdivisions=64 width=608 height=608 ! ./darknet detector train data/obj.data cfg/yolov4-custom.cfg cfg/yolov4.conv.137 -dont_show -map

yolov4

Due to many interruptions I missed one part of the chart of yolov4. I draw the mAP part, but didn't do the loss which was below 0.5 in the whole missing part.

The idea of this project is to support radiologists in finding lung cancer in an early stage when its most curable. In that stage the nodules are really small 2x3 in 512,512 image size which makes the detection task challenging. In this task it is very important to have high recall (sensitivity) so you don't miss the cancer, but also low FP so you don't say someone has cancer when he actually doesn't.

  1. Is there something that comes to your mind that I can try to further improve the result on yolov4? I see that in general in v4 I have higher and more fluctuating loss vs. v3
  2. Is there a way I can see the number of FP per image?
  3. How to get the Precision-Recall and ROC Curves?

Thank you!

AlexeyAB commented 4 years ago

Fix masks in yolov4-custom


Add there: https://github.com/AlexeyAB/darknet/blob/65506eb04acacc050a1903614066337071e866b8/src/detector.c#L1120

char *fp_name = basecfg(path);
printf(" %s: FP = %d \n", fp_name , fp_for_thresh );

How to get the Precision-Recall curve

Un-comment this line: https://github.com/AlexeyAB/darknet/blob/65506eb04acacc050a1903614066337071e866b8/src/detector.c#L1265

Run ./darknet detector map ... -points 11

AlexeyAB commented 4 years ago

And the show your new chart.png

hbiserinska commented 4 years ago

Hi @AlexeyAB Below is the new chart of yolov4 after fixing the masks. chart_yolov4-custom (11)

If I may ask you something about the loss functions in both v3 and v4. I am using yolo-obj.cfg and yolov4-custom.cfg from your repository. I know there are different possibilities to customize the loss function, but, I am using the default settings. My questions are:

  1. Is the bbox loss in v3 mse and ciou in v4?
  2. What is the loss for the class probability in both versions?
  3. To get the final loss in both versions do you use linear combination of bbox loss and class loss?
AlexeyAB commented 4 years ago
  1. Yes

  2. In both Yolov3 and v4 - there is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification - each bounded box (each anchor) can have several classes.

  3. Yes, with coefficients for bbox_loss: 0.07 for CIoU in Yolov4, and 1.0 for MSE in Yolov3. https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg#L1151

PGCodehub commented 4 years ago

Fix masks in yolov4-custom

  • yolo-obj.txt - correct masks
  • yolov4-custom.txt - incorrect masks

HI @AlexeyAB , @hbiserinska

can you please explain what is the wrong in the mask in yolov4-custom.txt, i am confused on the strategy to choose mask indexes , please help me on how to choose which mask for each yolo layer in yolov4 custom dataset training

should lower layers generate larger anchor boxes or vise versa

the anchor box calculations on my custom dataset is after running this command
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608-height 608-show

12, 34, 18, 70, 29, 53, 29,105, 59,106, 41,165, 67,230, 116,331, 246,504

The height and width I am training( in cfg) with is 608 608 Please help me with this