Open yucedagonurcan opened 4 years ago
Your procedure is correct. What mAP do you get?
yolov3.weights is trained on 120 000 images. The more training images - the higher accuracy you can achieve
Here is my output for:
./darknet detector map build/darknet/x64/data/custom.data cfg/custom.cfg build/darknet/x64/backup/Mini_6_Classes/custom_best.weights
CUDA-version: 10000 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1
CUDNN_HALF=1 OpenCV version: 4.2.0 compute_capability = 750, cudnn_half = 1 net.optimized_memory = 0 mini_batch = 1, batch = 16, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 33 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 33 0.011 BF 82 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 33 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 33 0.023 BF 94 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 33 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 33 0.046 BF 106 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou Total BFLOPS 65.341 avg_outputs = 517718 Allocate additional workspace_size = 52.43 MB Loading weights from build/darknet/x64/backup/Mini_6_Classes/custom_best.weights... seen 64, trained: 455 K-images (7 Kilo-batches_64) Done! Loaded 107 layers from weights-filecalculation mAP (mean average precision)... 2084 detections_count = 53214, unique_truth_count = 17712
class_id = 0, name = person, ap = 65.84% (TP = 3894, FP = 1816) class_id = 1, name = bicycle, ap = 89.47% (TP = 243, FP = 84) class_id = 2, name = bus, ap = 98.36% (TP = 524, FP = 98) class_id = 3, name = car, ap = 81.98% (TP = 6910, FP = 786) class_id = 4, name = motorcycle, ap = 87.25% (TP = 470, FP = 84) class_id = 5, name = truck, ap = 96.21% (TP = 937, FP = 246)for conf_thresh = 0.25, precision = 0.81, recall = 0.73, F1-score = 0.77 for conf_thresh = 0.25, TP = 12978, FP = 3114, FN = 4734, average IoU = 61.64 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.865188, or 86.52 % Total Detection Time: 236 Seconds
Set -points flag:
-points 101
for MS COCO-points 11
for PascalVOC 2007 (uncommentdifficult
in voc.data)-points 0
(AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
86.52 % is a good result.
Because I am experiencing very bad results(in 7k iteration) compared to YOLOv3 weights on webcam and images.
May be this is because your training dataset doesn't correspond to your test images: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more
My test and train sets are the same and I know it doesn’t consists so much of a close person photos(it is from car cam)
I am training another network with 4 classes only and keeping the index order as COCO names. Will compare the results with the network in the question(6 classes)
@AlexeyAB I trained another model with 20k training images and 2k validation with 4 classes. Here is the results at the end of the 12000th iteration: What do you think do I need to train with more data or there is something wrong that you can point out?
12000: 133.837112, 82.316750 avg loss, 0.000040 rate, 13.630626 seconds, 3072000 images, 0.878401 hours left Resizing to initial size: 608 x 608 try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done!
calculation mAP (mean average precision)... 2000 detections_count = 169310, unique_truth_count = 13061
class_id = 0, name = person, ap = 43.98% (TP = 1408, FP = 1866) class_id = 1, name = bicycle, ap = 46.49% (TP = 117, FP = 877) class_id = 2, name = car, ap = 63.32% (TP = 5033, FP = 1148) class_id = 3, name = motorbike, ap = 34.83% (TP = 94, FP = 809)for conf_thresh = 0.25, precision = 0.59, recall = 0.51, F1-score = 0.54 for conf_thresh = 0.25, TP = 6652, FP = 4700, FN = 6409, average IoU = 43.97 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.471557, or 47.16 % Total Detection Time: 64 Seconds
Set -points flag:
-points 101
for MS COCO-points 11
for PascalVOC 2007 (uncommentdifficult
in voc.data)-points 0
(AUC) for ImageNet, PascalVOC 2010-2012, your custom datasetmean_average_precision (mAP@0.5) = 0.471557 New best mAP! Saving weights to build/darknet/x64/backup/custom_best.weights MJPG_sender: new client 78 known client: 79, sent = -1, must be sent outlen = 183309 MJPG_sender: kill client 79 Close socket: out = -1, in = -1 MJPEG-stream sent. Saving weights to build/darknet/x64/backup/custom_12000.weights Saving weights to build/darknet/x64/backup/custom_last.weights Saving weights to build/darknet/x64/backup/custom_final.weights OpenCV exception: destroy_all_windows_cv MJPG_sender: new client 79 Close socket: out = 0, in = 0 MJPG_sender: close clinet: 0 Close socket: out = 0, in = 0 MJPG_sender: close acceptor: 0
I would recommend you to train 20 000 iterations. Or use yolov3-spp.cfg instead of yolov3-tiny.cfg
Okey, I did another 12k iterations: But it still very bad on webcam. It labels a person as a car 20% of the time. Is is a false expectation from a model to be robust on webcam as YOLO ?
(next mAP calculation at 24518 iterations) Last accuracy mAP@0.5 = 55.70 %, best = 55.70 % 24000: 82.994461, 65.949615 avg loss, 0.000040 rate, 12.988230 seconds, 6144000 images, 0.899318 hours left Resizing to initial size: 608 x 608 try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done!
calculation mAP (mean average precision)... 2000 detections_count = 124391, unique_truth_count = 13061
class_id = 0, name = person, ap = 48.13% (TP = 1487, FP = 1528) class_id = 1, name = bicycle, ap = 56.49% (TP = 120, FP = 496) class_id = 2, name = car, ap = 66.41% (TP = 5324, FP = 1109) class_id = 3, name = motorbike, ap = 51.13% (TP = 100, FP = 463)for conf_thresh = 0.25, precision = 0.66, recall = 0.54, F1-score = 0.59 for conf_thresh = 0.25, TP = 7031, FP = 3596, FN = 6030, average IoU = 49.99 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.555401, or 55.54 % Total Detection Time: 63 Seconds
Set -points flag:
-points 101
for MS COCO-points 11
for PascalVOC 2007 (uncommentdifficult
in voc.data)-points 0
(AUC) for ImageNet, PascalVOC 2010-2012, your custom datasetmean_average_precision (mAP@0.5) = 0.555400 MJPEG-stream sent. Saving weights to build/darknet/x64/backup/custom_24000.weights Saving weights to build/darknet/x64/backup/custom_last.weights Saving weights to build/darknet/x64/backup/custom_final.weights OpenCV exception: destroy_all_windows_cv
Or use yolov3-spp.cfg instead of yolov3-tiny.cfg
I am using yolov3.cfg not tiny but will try spp as well.
Here is my output for:
./darknet detector map build/darknet/x64/data/custom.data cfg/custom.cfg build/darknet/x64/backup/Mini_6_Classes/custom_best.weights
CUDA-version: 10000 (10010), cuDNN: 7.6.4, CUDNN_HALF=1, GPU count: 1 CUDNN_HALF=1 OpenCV version: 4.2.0 compute_capability = 750, cudnn_half = 1 net.optimized_memory = 0 mini_batch = 1, batch = 16, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3/ 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1/ 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3/ 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1, wt = 0, wn = 0, outputs: 208 x 208 x 64 0.003 BF 5 conv 128 3 x 3/ 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 9 conv 64 1 x 1/ 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3/ 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8, wt = 0, wn = 0, outputs: 104 x 104 x 128 0.001 BF 12 conv 256 3 x 3/ 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 16 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 19 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 22 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 25 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 28 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 31 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 34 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 52 x 52 x 256 0.001 BF 37 conv 512 3 x 3/ 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 41 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 44 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 47 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 50 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 53 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 56 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 59 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 26 x 26 x 512 0.000 BF 62 conv 1024 3 x 3/ 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 66 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 69 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 72 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71, wt = 0, wn = 0, outputs: 13 x 13 x1024 0.000 BF 75 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3/ 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 33 1 x 1/ 1 13 x 13 x1024 -> 13 x 13 x 33 0.011 BF 82 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 83 route 79 -> 13 x 13 x 512 84 conv 256 1 x 1/ 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 -> 26 x 26 x 768 87 conv 256 1 x 1/ 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3/ 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 33 1 x 1/ 1 26 x 26 x 512 -> 26 x 26 x 33 0.023 BF 94 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou 95 route 91 -> 26 x 26 x 256 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 2x 26 x 26 x 128 -> 52 x 52 x 128 98 route 97 36 -> 52 x 52 x 384 99 conv 128 1 x 1/ 1 52 x 52 x 384 -> 52 x 52 x 128 0.266 BF 100 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 101 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 102 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 103 conv 128 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 104 conv 256 3 x 3/ 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 105 conv 33 1 x 1/ 1 52 x 52 x 256 -> 52 x 52 x 33 0.046 BF 106 yolo classes_multipliers: 1.7, 35.3, 18.1, 1.0, 17.8, 9.9, [yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00 Wrong iou_thresh_kind = iou Total BFLOPS 65.341 avg_outputs = 517718 Allocate additional workspace_size = 52.43 MB Loading weights from build/darknet/x64/backup/Mini_6_Classes/custom_best.weights... seen 64, trained: 455 K-images (7 Kilo-batches_64) Done! Loaded 107 layers from weights-file calculation mAP (mean average precision)... 2084 detections_count = 53214, unique_truth_count = 17712 class_id = 0, name = person, ap = 65.84% (TP = 3894, FP = 1816) class_id = 1, name = bicycle, ap = 89.47% (TP = 243, FP = 84) class_id = 2, name = bus, ap = 98.36% (TP = 524, FP = 98) class_id = 3, name = car, ap = 81.98% (TP = 6910, FP = 786) class_id = 4, name = motorcycle, ap = 87.25% (TP = 470, FP = 84) class_id = 5, name = truck, ap = 96.21% (TP = 937, FP = 246) for conf_thresh = 0.25, precision = 0.81, recall = 0.73, F1-score = 0.77 for conf_thresh = 0.25, TP = 12978, FP = 3114, FN = 4734, average IoU = 61.64 % IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.865188, or 86.52 % Total Detection Time: 236 Seconds Set -points flag:
-points 101
for MS COCO-points 11
for PascalVOC 2007 (uncommentdifficult
in voc.data)-points 0
(AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
Hi @yucedagonurcan Can I ask you a couple questions? I am also dealing with the project which is similar to your project? Here is my situation: My target task is to detect person, airplane and ship ( different from boat in MSCOCO). I set up file .names as the order 0 ship 1 person 2 airplane I shrink yolov3.weight to yolov3.conv.81 as start point In my dataset, my training dataset includes the images of ship, airplane and nearly 5% percent of person. Can I ask you:
You remain person class in file .names so do you train your model on dataset which includes person images?
Yes.
And are those person image from MS COCO or your own person dataset?
You can use any images.
Do the order of class in file .names affect to our performance?
No.
@duongdqq , @AlexeyAB already answered the questions but I want to add to my case:
And are those person image from MS COCO or your own person dataset?
No it was my personal dataset with road cam images.
Do the order of class in file .names affect to our performance? I actually tried this and it is not affecting the performance since you are cutting the last layer which is mapping detection to the classes. Which helps a lot is that increasing the
If you really want to achieve a good performance I can suggest you to clear the yolov3.weights (with -clear) and retrain with a low learning rate with your classes and ignore other classes with dont_show (in names file). This seems to work and I am working on it now.
@yucedagonurcan thank you for your suggestion. I do really appreciate that. In your recommendation, there are some things I dont understand
No it was my personal dataset with road cam images.
So in my case, I want to use the pre trained model to detect person without training again person class. Is that possible to utilize that? And what I have to get it? If you really want to achieve a good performance I can suggest you to clear the yolov3.weights (with -clear) and retrain with a low learning rate with your classes and ignore other classes with dont_show (in names file). This seems to work and I am working on it now.
When I add dont_show in names file. How do it work? and why do NOT I delete the unnecessary class names and just use only class names I need instead of add dont_show?
Hello @duongdqq,
So in my case, I want to use the pre trained model to detect person without training again person class. Is that possible to utilize that? And what I have to get it?
- So for this case (if you don't want to train) you can simply use yolov3.weights with coco.names. Maybe I don't understand it can you collaborate on that?
When I add dont_show in names file. How do it work?
- When you add dont_show on the class names, darknet will ignore the prediction so when you don't want a class to be in the output (e.g aeroplane) you can add that.
and why do NOT I delete the unnecessary class names and just use only class names I need instead of add dont_show?
- Okay, if you do that you will need to change the output layer. When you change the output layer, you need to retrain the network. That's why.
Hello @yucedagonurcan ,
So for this case (if you don't want to train) you can simply use yolov3.weights with coco.names. Maybe I don't understand it can you collaborate on that?
I mean I do want to train because I have some more type of classes Actually, I want to detect 4 objects ( person, ship, airplane and helicopter). I intend to set up like that: When you add dont_show on the class names, darknet will ignore the prediction so when you don't want a class to be in the output (e.g aeroplane) you can add that.
- When I remain person class, add dont_show at unnecessary classes and ADD 3 MORE my own class, so totally I have 83 classes. That means I have to configure the model again to fit 83 classes right?
- I try to take yolov3.conv.81 as my start point, build dataset without person images and receive very bad AP for person ( 30%) (1)
- Do you think remain person class ( in .names file), add 3 more classes to dataset ( no person) and add dont_show to unnecessary classes and train again with classes = 83, I will receive a better score for person than (1) To be honest @yucedagonurcan
When I remain person class, add dont_show at unnecessary classes and ADD 3 MORE my own class, so totally I have 83 classes. That means I have to configure the model again to fit 83 classes right?
I try to take yolov3.conv.81 as my start point, build dataset without person images and receive very bad AP for person ( 30%) (1)
Do you think remain person class ( in .names file), add 3 more classes to dataset ( no person) and add dont_show to unnecessary classes and train again with classes = 83, I will receive a better score for person than (1)
How many images do you have? How many iterations did you do? Can you train yolov3.conv.81(with your original dataset e.g person, ship...) at least 8k iterations and check the results?
How many images do you have? How many iterations did you do? Can you train yolov3.conv.81(with your original dataset e.g person, ship...) at least 8k iterations and check the results?
- I just get person images from COCO dataset. So I have 60k person, 4k ship, 2k airplane and 1k helicopter (I). And I use yolov3.conv.81 as the starting point. Set up model with 4 classes. one issue says that should train with 4 epochs in which 1 epoch = total images/4. So that total 4 epochs = 4000 iteration, but for sure, I train with 30 000 iterations.
- I want to train from scratch but I cut 81 narrow layers, and add person images which trained in yolov3.conv.81. Do you think I am wrong
But if you keep the classification layer's weights also, (I guess it was 105th layer, before last output layer) I believe it will work better.
Can I take yolov3.conv.105 as my start point for my above dataset (I) instead of yolov3.conv.81 to get better results Massive thanks @yucedagonurcan
Okay I will inform you with my insights:
I want to train from scratch but I cut 81 narrow layers, and add person images which trained in yolov3.conv.81. Do you think I am wrong
Can I take yolov3.conv.105 as my start point for my above dataset (I) instead of yolov3.conv.81 to get better results
Note: Also, can you share your results so we can point out the problems?
Hello @AlexeyAB and everyone Please help me out to find what might be improved.
Problem: Many False Negatives for small object custom detection
Last accuracy mAP@0.5 = 22.18 %, best = 22.18 % 6000: 0.077829, 0.103566 avg loss, 0.001000 rate, 20.334336 seconds, 384000 images, 25.264007 hours left
calculation mAP (mean average precision)...
detections_count = 309, unique_truth_count = 177
rank = 0 of ranks = 309
rank = 100 of ranks = 309
rank = 200 of ranks = 309
rank = 300 of ranks = 309
class_id = 0, name = nonnodule, ap = 0.00% (TP = 0, FP = 0)
class_id = 1, name = nodule, ap = 43.14% (TP = 82, FP = 52)
for conf_thresh = 0.25, precision = 0.61, recall = 0.46, F1-score = 0.53 for conf_thresh = 0.25, TP = 82, FP = 52, FN = 95, average IoU = 41.58 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.215685, or 21.57 %
General Info: repository: AlexeyAB/darknet weights: darknet53.conv.74 cfg: yolov3.cfg
input image size = 512,512
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 512 -height 512 -show anchors = 5, 5, 5, 7, 7, 7, 7, 11, 10, 9, 12, 12, 15, 17, 23, 22, 39, 41
data: 50/50 positive and negative class. For the negative class I have empty txt files.
layers = -1, 11, stride = 4 based on https://github.com/AlexeyAB/darknet#how-to-improve-object-detection "for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11". 1. How to choose the right layers and stride values for my problem? Is any of the other two options more appropriate in that case: (stride = 8, layers = -1, 4 or stride = 16, layers = -1, 0)
cfg amendments are as follows:
exposure = 1.5
learning_rate=0.001 burn_in=1000 max_batches = 10000 policy=steps steps=8000,9000 scales=.1,.1
[yolo] mask = 10 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[yolo] mask = 9 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41, 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1
[route] layers = -1, 11
[yolo] mask = 0,1,2,3,4,5,6,7,8 anchors = 5,5, 5,7, 7,7, 7,11, 10,9, 12,12, 15,17, 23,22, 39,41, 30,30, 60,60 classes=2 num=11 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=0 # in order to decrease the running time
the last two anchors - (30,30 and 60, 60) were not suggested by _calcanchors, but from you in one of the issues. 2. Should I remove the last two anchors? Is the current anchors set up correct or I should move 39,41 in the second yolo layer?
Please suggest what can be changed in order to improve the results. 3. Should I invest more time in image pre-processing. How important is that for Yolo?
@hbiserinska
class_id = 0, name = nonnodule, ap = 0.00% (TP = 0, FP = 0) class_id = 1, name = nodule, ap = 43.14% (TP = 82, FP = 52)
Actually AP=43.14% rather than ~20%, because for some reason you added a class nonnodule
, but did not mark it in the training and validation dataset.
What is your average image size?
How does size and aspect ratio vary in your training images?
What command do you use for training?
Try to train yolov4 (set your anchors, classes, steps and max_batches) using this pre-trained weight https://drive.google.com/open?id=1JKF-bdIklxOOVy-2Cr5qdvjgGpmGfcbp
and do these changes: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L895 set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L892 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989
@AlexeyAB , Yolov4 is an outstanding work.
I have been struggling with precision around 40% for the last 2 months. Now, I trained with the positive class only (nodule) on yolov3 and yolov4 following your instructions and weights from the comment above. Both - yolo v3 and v4 were trained on the same dataset, with the same anchors. All images are the same size (512, 512, 3).
!./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 512 -height 512 -show
anchors = 1, 1, 4, 4, 5, 7, 6, 7, 9, 7, 8, 10, 12, 11, 16, 17, 30, 26
YoLov3
cfg: yolo-obj.txt
subdivisions=16
width=512
height=512
! ./darknet detector train data/obj.data cfg/yolo-obj.cfg cfg/darknet53.conv.74 -dont_show -map
YoLov4
cfg: yolov4-custom.txt
subdivisions=64
width=608
height=608
! ./darknet detector train data/obj.data cfg/yolov4-custom.cfg cfg/yolov4.conv.137 -dont_show -map
Due to many interruptions I missed one part of the chart of yolov4. I draw the mAP part, but didn't do the loss which was below 0.5 in the whole missing part.
The idea of this project is to support radiologists in finding lung cancer in an early stage when its most curable. In that stage the nodules are really small 2x3 in 512,512 image size which makes the detection task challenging. In this task it is very important to have high recall (sensitivity) so you don't miss the cancer, but also low FP so you don't say someone has cancer when he actually doesn't.
Thank you!
Fix masks in yolov4-custom
char *fp_name = basecfg(path);
printf(" %s: FP = %d \n", fp_name , fp_for_thresh );
How to get the Precision-Recall curve
Un-comment this line: https://github.com/AlexeyAB/darknet/blob/65506eb04acacc050a1903614066337071e866b8/src/detector.c#L1265
Run
./darknet detector map ... -points 11
And the show your new chart.png
Hi @AlexeyAB Below is the new chart of yolov4 after fixing the masks.
If I may ask you something about the loss functions in both v3 and v4. I am using yolo-obj.cfg and yolov4-custom.cfg from your repository. I know there are different possibilities to customize the loss function, but, I am using the default settings. My questions are:
Yes
In both Yolov3 and v4 - there is used Binary cross-entropy with Logistic activation (sigmoid) for multi-label classification - each bounded box (each anchor) can have several classes.
Yes, with coefficients for bbox_loss: 0.07
for CIoU in Yolov4, and 1.0
for MSE in Yolov3. https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg#L1151
Fix masks in yolov4-custom
- yolo-obj.txt - correct masks
- yolov4-custom.txt - incorrect masks
HI @AlexeyAB , @hbiserinska
can you please explain what is the wrong in the mask in yolov4-custom.txt, i am confused on the strategy to choose mask indexes , please help me on how to choose which mask for each yolo layer in yolov4 custom dataset training
should lower layers generate larger anchor boxes or vise versa
the anchor box calculations on my custom dataset is
after running this command
./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 608-height 608-show
12, 34, 18, 70, 29, 53, 29,105, 59,106, 41,165, 67,230, 116,331, 246,504
The height and width I am training( in cfg) with is 608 608 Please help me with this
Hello everyone, I am trying to train a custom dataset using fine-tuning method but I need to ask the procedure more clearly. I already read #2147, #3719, #2139, #4585.
I have a dataset which consists of 6 classes: Person, Bicycle, Bus, Car, Motorcycle and Truck. I labelled them and checked it on YOLO_mark.
I used yolov3.cfg file and changed the CFG file's filters, counters_per_class.
Later, I cut the first 81 layers from
yolov3.weights
and createdyolov3.conv.81
file with:./darknet partial cfg/yolov3.cfg yolov3.weights yolov3.conv.81 81
Then trained with:
./darknet detector train build/darknet/x64/data/custom.data cfg/custom.cfg yolov3.conv.81 -map
Here is my
custom.names
fileSo my question is, we are cutting the first 81 layers and put it on a weight file, then when we train network initializes these unknown layers with random manner and train on them. You can see my names file's indices for names is different from the COCO names. Do you think it can lead to high training times or does it matter(because we are keeping the first 81 layers and I suppose it will have a bias for what the output layer needs to be)?
Another question is about the procedure, I want to be able to achieve scores as YOLOv3. So when I used darknet53.conv.74 it gave me worse results than cutting the
yolov3.weights
from its 81 layers. How can I use YOLOv3 weights and build on top of it for my classes?Note: I need to mention that for proof of concept I am using 2084 images and use them train and validation. I know it can lead to results like this but I want to be sure about the procedure. Have a great day.