Closed kame-lqm closed 4 years ago
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more
https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more
Thanks for your reply, but I still couldn't understand what is the root cause. And I try to describe my problem in details:
I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset. There are more than 18000 images in training dataset, and more than 10 cars in each image in average. Although I set 'classes=80' in cfg, it is only 11 classes in my dataset. And I already trained it more than 60, 000 itrs. So, I guess it is not the problem of dataset, maybe the problem from my cfg file. I attach my cfg file here, and hopefully to help me to take a look. Thanks so much.
[net]
Testing
Training batch=64 subdivisions=16
width=544 height=544 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.001 burn_in=1000 max_batches = 60000 policy=steps steps=10000,20000 scales=.1,.1
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
Downsample [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample [convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample [convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample [convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
Downsample [convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
[shortcut] from=-3 activation=linear
######################
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 6,7,8 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 61
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 3,4,5 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=11 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 36
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 0,1,2 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.
Why?
Show chart.png with Loss and mAP
What mAP do you get?
What validation dataset did you use?
I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.
Are class_id's the same for the same objects in both datasets?
Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.
Why?
Show chart.png with Loss and mAP
What mAP do you get?
What validation dataset did you use?
I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.
Are class_id's the same for the same objects in both datasets?
They used different class-ids. For the Visdrone dataset, all people are set class id as "pedistrian", and for the WiderPerson dataset, all people are set class id as "people". Visdrone dataset are collected from drone's camera, it is quite different from WiderPerson dataset which are collected from surveillance camera on the street.
Add chart.png here (I forgot to add '-map', so there is no mAP in the chart):
Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.
Why? Show chart.png with Loss and mAP What mAP do you get? What validation dataset did you use?
I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.
Are class_id's the same for the same objects in both datasets?
They used different class-ids. For the Visdrone dataset, all people are set class id as "pedistrian", and for the WiderPerson dataset, all people are set class id as "people". Visdrone dataset are collected from drone's camera, it is quite different from WiderPerson dataset which are collected from surveillance camera on the street.
Add chart.png here (I forgot to add '-map', so there is no mAP in the chart):
(next mAP calculation at 10347 iterations)
10347: 7.988983, 8.454061 avg loss, 0.000300 rate, 4.369828 seconds, 1986624 images, 76.417881 hours left
Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 52.43 MB
CUDA allocate done!
try to allocate additional workspace_size = 52.43 MB
CUDA allocate done!
try to allocate additional workspace_size = 52.43 MB
CUDA allocate done!
calculation mAP (mean average precision)...
4576
detections_count = 1148086, unique_truth_count = 106417
class_id = 0, name = pedestrian, ap = 10.24% (TP = 2580, FP = 5155)
class_id = 1, name = people, ap = 26.38% (TP = 3212, FP = 2791) # class id from dataset 'WiderPerson', and other class ids are all from dataset 'Visdrone2018'
class_id = 2, name = bicycle, ap = 2.19% (TP = 23, FP = 57)
class_id = 3, name = car, ap = 57.78% (TP = 27420, FP = 15531)
class_id = 4, name = van, ap = 28.83% (TP = 1794, FP = 1481)
class_id = 5, name = truck, ap = 43.74% (TP = 1893, FP = 1000)
class_id = 6, name = tricycle, ap = 4.56% (TP = 3, FP = 9)
class_id = 7, name = awning-tricycle, ap = 3.54% (TP = 0, FP = 1)
class_id = 8, name = bus, ap = 31.48% (TP = 360, FP = 219)
class_id = 9, name = motor, ap = 9.29% (TP = 917, FP = 1936)
class_id = 10, name = bulldozer, ap = 83.49% (TP = 94, FP = 21)
class_id = 11, name = 12, ap = 31.54% (TP = 0, FP = 0)
class_id = 12, name = 13, ap = 0.00% (TP = 0, FP = 0)
class_id = 13, name = 14, ap = 0.00% (TP = 0, FP = 0)
class_id = 14, name = 15, ap = 0.00% (TP = 0, FP = 0)
class_id = 15, name = 16, ap = 0.00% (TP = 0, FP = 0)
class_id = 16, name = 17, ap = 0.00% (TP = 0, FP = 0)
class_id = 17, name = 18, ap = 0.00% (TP = 0, FP = 0)
class_id = 18, name = 19, ap = 0.00% (TP = 0, FP = 0)
class_id = 19, name = 20, ap = 0.00% (TP = 0, FP = 0)
class_id = 20, name = 21, ap = 0.00% (TP = 0, FP = 0)
class_id = 21, name = 22, ap = 0.00% (TP = 0, FP = 0)
class_id = 22, name = 23, ap = 0.00% (TP = 0, FP = 0)
class_id = 23, name = 24, ap = 0.00% (TP = 0, FP = 0)
class_id = 24, name = 25, ap = 0.00% (TP = 0, FP = 0)
class_id = 25, name = 26, ap = 0.00% (TP = 0, FP = 0)
class_id = 26, name = 27, ap = 0.00% (TP = 0, FP = 0)
class_id = 27, name = 28, ap = 0.00% (TP = 0, FP = 0)
class_id = 28, name = 29, ap = 0.00% (TP = 0, FP = 0)
class_id = 29, name = 30, ap = 0.00% (TP = 0, FP = 0)
class_id = 30, name = 31, ap = 0.00% (TP = 0, FP = 0)
class_id = 31, name = 32, ap = 0.00% (TP = 0, FP = 0)
class_id = 32, name = 33, ap = 0.00% (TP = 0, FP = 0)
class_id = 33, name = 34, ap = 0.00% (TP = 0, FP = 0)
class_id = 34, name = 35, ap = 0.00% (TP = 0, FP = 0)
class_id = 35, name = 36, ap = 0.00% (TP = 0, FP = 0)
class_id = 36, name = 37, ap = 0.00% (TP = 0, FP = 0)
class_id = 37, name = 38, ap = 0.00% (TP = 0, FP = 0)
class_id = 38, name = 39, ap = 0.00% (TP = 0, FP = 0)
class_id = 39, name = 40, ap = 0.00% (TP = 0, FP = 0)
class_id = 40, name = 41, ap = 0.00% (TP = 0, FP = 0)
class_id = 41, name = 42, ap = 0.00% (TP = 0, FP = 0)
class_id = 42, name = 43, ap = 0.00% (TP = 0, FP = 0)
class_id = 43, name = 44, ap = 0.00% (TP = 0, FP = 0)
class_id = 44, name = 45, ap = 0.00% (TP = 0, FP = 0)
class_id = 45, name = 46, ap = 0.00% (TP = 0, FP = 0)
class_id = 46, name = 47, ap = 0.00% (TP = 0, FP = 0)
class_id = 47, name = 48, ap = 0.00% (TP = 0, FP = 0)
class_id = 48, name = 49, ap = 0.00% (TP = 0, FP = 0)
class_id = 49, name = 50, ap = 0.00% (TP = 0, FP = 0)
class_id = 50, name = 51, ap = 0.00% (TP = 0, FP = 0)
class_id = 51, name = 52, ap = 0.00% (TP = 0, FP = 0)
class_id = 52, name = 53, ap = 0.00% (TP = 0, FP = 0)
class_id = 53, name = 54, ap = 0.00% (TP = 0, FP = 0)
class_id = 54, name = 55, ap = 0.00% (TP = 0, FP = 0)
class_id = 55, name = 56, ap = 0.00% (TP = 0, FP = 0)
class_id = 56, name = 57, ap = 0.00% (TP = 0, FP = 0)
class_id = 57, name = 58, ap = 0.00% (TP = 0, FP = 0)
class_id = 58, name = 59, ap = 0.00% (TP = 0, FP = 0)
class_id = 59, name = 60, ap = 0.00% (TP = 0, FP = 0)
class_id = 60, name = 61, ap = 0.00% (TP = 0, FP = 0)
class_id = 61, name = 62, ap = 0.00% (TP = 0, FP = 0)
class_id = 62, name = 63, ap = 0.00% (TP = 0, FP = 0)
class_id = 63, name = 64, ap = 0.00% (TP = 0, FP = 0)
class_id = 64, name = 65, ap = 0.00% (TP = 0, FP = 0)
class_id = 65, name = 66, ap = 0.00% (TP = 0, FP = 0)
class_id = 66, name = 67, ap = 0.00% (TP = 0, FP = 0)
class_id = 67, name = 68, ap = 0.00% (TP = 0, FP = 0)
class_id = 68, name = 69, ap = 0.00% (TP = 0, FP = 0)
class_id = 69, name = 70, ap = 0.00% (TP = 0, FP = 0)
class_id = 70, name = 71, ap = 0.00% (TP = 0, FP = 0)
class_id = 71, name = 72, ap = 0.00% (TP = 0, FP = 0)
class_id = 72, name = 73, ap = 0.00% (TP = 0, FP = 0)
class_id = 73, name = 74, ap = 0.00% (TP = 0, FP = 0)
class_id = 74, name = 75, ap = 0.00% (TP = 0, FP = 0)
class_id = 75, name = 76, ap = 0.00% (TP = 0, FP = 0)
class_id = 76, name = 77, ap = 0.00% (TP = 0, FP = 0)
class_id = 77, name = 78, ap = 0.00% (TP = 0, FP = 0)
class_id = 78, name = 79, ap = 0.00% (TP = 0, FP = 0)
class_id = 79, name = 80, ap = 0.00% (TP = 0, FP = 0)
for conf_thresh = 0.25, precision = 0.58, recall = 0.36, F1-score = 0.44 for conf_thresh = 0.25, TP = 38296, FP = 28201, FN = 68121, average IoU = 42.32 %
IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.041633, or 4.16 % Total Detection Time: 213 Seconds
Set -points flag:
-points 101
for MS COCO
-points 11
for PascalVOC 2007 (uncomment difficult
in voc.data)
-points 0
(AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset
mean_average_precision (mAP@0.5) = 0.041633
2. test results:
[net]
batch=64 subdivisions=16
width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.001 burn_in=1000 max_batches = 60000 policy=steps steps=10000,20000 scales=.1,.1
[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=2
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[maxpool] size=2 stride=1
[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky
###########
[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky
[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 6,7,8 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263
classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
[route] layers = -4
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 8
[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 3,4,5 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263
classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
[route] layers = -3
[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky
[upsample] stride=2
[route] layers = -1, 6
[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky
[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] mask = 0,1,2 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263
classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1
Hi Alexey, I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?
pedestrian people bicycle car van truck tricycle awning-tricycle bus motor bulldozer 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
Is there anyone know what happen?
I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?
Mistake is in your dataset/coco.names.
Is there anyone know what happen?
- Because you trained model on 544x544, so the best result will be for 544x544 or slightly higher.
- Because you use yolov3-tiny_3l.cfg that has very low accuracy.
I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?
Mistake is in your dataset/coco.names.
@AlexeyAB Thanks so much.
@AlexeyAB I already found out what happen on this issue. When I used the source code "pjreddie/darknet" to test my cfg & weights, it gets the wrong result as I mentioned above. And When I used your source code to test them, The result is perfect.
CORRECT TEST RESULT
@kame-lqm I have trained two Yolov4 models. one using resolution 416x 416 and the other 512x512. however, the model with 512x512 has a lower mAP than the 416x416. It is confusing for me that it should be the opposite? the input images were all equal size 1008 x 1008. any help will be appreciated.
anchor generated at 416: 10, 10, 18, 8, 8, 18, 12, 12, 14, 14, 16, 15, 18, 18, 21, 21, 25, 25 ...............90.52% IOU
I set 'random=1' in cfg file, and train custom dataset, and then I set 416x416 in cfg file to detect objs, but it can detect most of the small size objs, can not detect the one of big size. And when I set 544x544 in cfg file, most of the big size objs can be detected. That's quite weird, and why? Is there anyone know what happen? And How can I detect the big size objs in 416x416? Thanks in advance.
416x416 in cfg:
544x544 in cfg: