AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Why does 416x416 cfg can not recognize big size objects? #5497

Closed kame-lqm closed 4 years ago

kame-lqm commented 4 years ago

I set 'random=1' in cfg file, and train custom dataset, and then I set 416x416 in cfg file to detect objs, but it can detect most of the small size objs, can not detect the one of big size. And when I set 544x544 in cfg file, most of the big size objs can be detected. That's quite weird, and why? Is there anyone know what happen? And How can I detect the big size objs in 416x416? Thanks in advance.

416x416 in cfg: 1480113_416x416

544x544 in cfg: 1480113_544x544

AlexeyAB commented 4 years ago

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

kame-lqm commented 4 years ago

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train 2000*classes iterations or more

Thanks for your reply, but I still couldn't understand what is the root cause. And I try to describe my problem in details:

I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset. There are more than 18000 images in training dataset, and more than 10 cars in each image in average. Although I set 'classes=80' in cfg, it is only 11 classes in my dataset. And I already trained it more than 60, 000 itrs. So, I guess it is not the problem of dataset, maybe the problem from my cfg file. I attach my cfg file here, and hopefully to help me to take a look. Thanks so much.

-----------------------------------------------------------------------------------------------------------

[net]

Testing

batch=1

subdivisions=1

Training batch=64 subdivisions=16

width=544 height=544 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 60000 policy=steps steps=10000,20000 scales=.1,.1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

Downsample [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample [convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample [convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample [convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

Downsample [convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

[shortcut] from=-3 activation=linear

######################

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 6,7,8 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 61

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 3,4,5 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=11 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 36

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 0,1,2 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263 classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

AlexeyAB commented 4 years ago

Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.

Why?

Show chart.png with Loss and mAP

What mAP do you get?

What validation dataset did you use?

I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.

Are class_id's the same for the same objects in both datasets?

kame-lqm commented 4 years ago

Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.

Why?

Show chart.png with Loss and mAP

What mAP do you get?

What validation dataset did you use?

I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.

Are class_id's the same for the same objects in both datasets?

They used different class-ids. For the Visdrone dataset, all people are set class id as "pedistrian", and for the WiderPerson dataset, all people are set class id as "people". Visdrone dataset are collected from drone's camera, it is quite different from WiderPerson dataset which are collected from surveillance camera on the street.

Add chart.png here (I forgot to add '-map', so there is no mAP in the chart): chart_yolov3_visdrone_ft0_60000

kame-lqm commented 4 years ago

Although I set 'classes=80' in cfg, it is only 11 classes in my dataset.

Why? Show chart.png with Loss and mAP What mAP do you get? What validation dataset did you use?

I used the Visdrone2018 dataset and part of the WiderPerson dataset as my dataset, and there are more than 60 similar images as above two images in training dataset. And there are all kind of cars and person in this dataset.

Are class_id's the same for the same objects in both datasets?

They used different class-ids. For the Visdrone dataset, all people are set class id as "pedistrian", and for the WiderPerson dataset, all people are set class id as "people". Visdrone dataset are collected from drone's camera, it is quite different from WiderPerson dataset which are collected from surveillance camera on the street.

Add chart.png here (I forgot to add '-map', so there is no mAP in the chart): chart_yolov3_visdrone_ft0_60000

Hi Alexey, I'm now training the same datasets using new cfg file, and the following are the training result (the training is not finish yet). 1. mAP at itrs 10347:

(next mAP calculation at 10347 iterations) 10347: 7.988983, 8.454061 avg loss, 0.000300 rate, 4.369828 seconds, 1986624 images, 76.417881 hours left Resizing to initial size: 416 x 416 try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! try to allocate additional workspace_size = 52.43 MB CUDA allocate done! calculation mAP (mean average precision)... 4576 detections_count = 1148086, unique_truth_count = 106417
class_id = 0, name = pedestrian, ap = 10.24% (TP = 2580, FP = 5155) class_id = 1, name = people, ap = 26.38% (TP = 3212, FP = 2791) # class id from dataset 'WiderPerson', and other class ids are all from dataset 'Visdrone2018' class_id = 2, name = bicycle, ap = 2.19% (TP = 23, FP = 57) class_id = 3, name = car, ap = 57.78% (TP = 27420, FP = 15531) class_id = 4, name = van, ap = 28.83% (TP = 1794, FP = 1481) class_id = 5, name = truck, ap = 43.74% (TP = 1893, FP = 1000) class_id = 6, name = tricycle, ap = 4.56% (TP = 3, FP = 9) class_id = 7, name = awning-tricycle, ap = 3.54% (TP = 0, FP = 1) class_id = 8, name = bus, ap = 31.48% (TP = 360, FP = 219) class_id = 9, name = motor, ap = 9.29% (TP = 917, FP = 1936) class_id = 10, name = bulldozer, ap = 83.49% (TP = 94, FP = 21) class_id = 11, name = 12, ap = 31.54% (TP = 0, FP = 0) class_id = 12, name = 13, ap = 0.00% (TP = 0, FP = 0) class_id = 13, name = 14, ap = 0.00% (TP = 0, FP = 0) class_id = 14, name = 15, ap = 0.00% (TP = 0, FP = 0) class_id = 15, name = 16, ap = 0.00% (TP = 0, FP = 0) class_id = 16, name = 17, ap = 0.00% (TP = 0, FP = 0) class_id = 17, name = 18, ap = 0.00% (TP = 0, FP = 0) class_id = 18, name = 19, ap = 0.00% (TP = 0, FP = 0) class_id = 19, name = 20, ap = 0.00% (TP = 0, FP = 0) class_id = 20, name = 21, ap = 0.00% (TP = 0, FP = 0) class_id = 21, name = 22, ap = 0.00% (TP = 0, FP = 0) class_id = 22, name = 23, ap = 0.00% (TP = 0, FP = 0) class_id = 23, name = 24, ap = 0.00% (TP = 0, FP = 0) class_id = 24, name = 25, ap = 0.00% (TP = 0, FP = 0) class_id = 25, name = 26, ap = 0.00% (TP = 0, FP = 0) class_id = 26, name = 27, ap = 0.00% (TP = 0, FP = 0) class_id = 27, name = 28, ap = 0.00% (TP = 0, FP = 0) class_id = 28, name = 29, ap = 0.00% (TP = 0, FP = 0) class_id = 29, name = 30, ap = 0.00% (TP = 0, FP = 0) class_id = 30, name = 31, ap = 0.00% (TP = 0, FP = 0) class_id = 31, name = 32, ap = 0.00% (TP = 0, FP = 0) class_id = 32, name = 33, ap = 0.00% (TP = 0, FP = 0) class_id = 33, name = 34, ap = 0.00% (TP = 0, FP = 0) class_id = 34, name = 35, ap = 0.00% (TP = 0, FP = 0) class_id = 35, name = 36, ap = 0.00% (TP = 0, FP = 0) class_id = 36, name = 37, ap = 0.00% (TP = 0, FP = 0) class_id = 37, name = 38, ap = 0.00% (TP = 0, FP = 0) class_id = 38, name = 39, ap = 0.00% (TP = 0, FP = 0) class_id = 39, name = 40, ap = 0.00% (TP = 0, FP = 0) class_id = 40, name = 41, ap = 0.00% (TP = 0, FP = 0) class_id = 41, name = 42, ap = 0.00% (TP = 0, FP = 0) class_id = 42, name = 43, ap = 0.00% (TP = 0, FP = 0) class_id = 43, name = 44, ap = 0.00% (TP = 0, FP = 0) class_id = 44, name = 45, ap = 0.00% (TP = 0, FP = 0) class_id = 45, name = 46, ap = 0.00% (TP = 0, FP = 0) class_id = 46, name = 47, ap = 0.00% (TP = 0, FP = 0) class_id = 47, name = 48, ap = 0.00% (TP = 0, FP = 0) class_id = 48, name = 49, ap = 0.00% (TP = 0, FP = 0) class_id = 49, name = 50, ap = 0.00% (TP = 0, FP = 0) class_id = 50, name = 51, ap = 0.00% (TP = 0, FP = 0) class_id = 51, name = 52, ap = 0.00% (TP = 0, FP = 0) class_id = 52, name = 53, ap = 0.00% (TP = 0, FP = 0) class_id = 53, name = 54, ap = 0.00% (TP = 0, FP = 0) class_id = 54, name = 55, ap = 0.00% (TP = 0, FP = 0) class_id = 55, name = 56, ap = 0.00% (TP = 0, FP = 0) class_id = 56, name = 57, ap = 0.00% (TP = 0, FP = 0) class_id = 57, name = 58, ap = 0.00% (TP = 0, FP = 0) class_id = 58, name = 59, ap = 0.00% (TP = 0, FP = 0) class_id = 59, name = 60, ap = 0.00% (TP = 0, FP = 0) class_id = 60, name = 61, ap = 0.00% (TP = 0, FP = 0) class_id = 61, name = 62, ap = 0.00% (TP = 0, FP = 0) class_id = 62, name = 63, ap = 0.00% (TP = 0, FP = 0) class_id = 63, name = 64, ap = 0.00% (TP = 0, FP = 0) class_id = 64, name = 65, ap = 0.00% (TP = 0, FP = 0) class_id = 65, name = 66, ap = 0.00% (TP = 0, FP = 0) class_id = 66, name = 67, ap = 0.00% (TP = 0, FP = 0) class_id = 67, name = 68, ap = 0.00% (TP = 0, FP = 0) class_id = 68, name = 69, ap = 0.00% (TP = 0, FP = 0) class_id = 69, name = 70, ap = 0.00% (TP = 0, FP = 0) class_id = 70, name = 71, ap = 0.00% (TP = 0, FP = 0) class_id = 71, name = 72, ap = 0.00% (TP = 0, FP = 0) class_id = 72, name = 73, ap = 0.00% (TP = 0, FP = 0) class_id = 73, name = 74, ap = 0.00% (TP = 0, FP = 0) class_id = 74, name = 75, ap = 0.00% (TP = 0, FP = 0) class_id = 75, name = 76, ap = 0.00% (TP = 0, FP = 0) class_id = 76, name = 77, ap = 0.00% (TP = 0, FP = 0) class_id = 77, name = 78, ap = 0.00% (TP = 0, FP = 0) class_id = 78, name = 79, ap = 0.00% (TP = 0, FP = 0) class_id = 79, name = 80, ap = 0.00% (TP = 0, FP = 0)

for conf_thresh = 0.25, precision = 0.58, recall = 0.36, F1-score = 0.44 for conf_thresh = 0.25, TP = 38296, FP = 28201, FN = 68121, average IoU = 42.32 %

IoU threshold = 50 %, used Area-Under-Curve for each unique Recall mean average precision (mAP@0.50) = 0.041633, or 4.16 % Total Detection Time: 213 Seconds

Set -points flag: -points 101 for MS COCO -points 11 for PascalVOC 2007 (uncomment difficult in voc.data) -points 0 (AUC) for ImageNet, PascalVOC 2010-2012, your custom dataset

mean_average_precision (mAP@0.5) = 0.041633

2. test results: 1480120 jpg

1480113 jpg

3. cfg file:

[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=16

width=416 height=416 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 60000 policy=steps steps=10000,20000 scales=.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=1

[convolutional] batch_normalize=1 filters=1024 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky

[convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 6,7,8 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263

anchors = 4,7, 7,15, 13,25, 25,42, 41,67, 75,94, 91,162, 158,205, 250,332

classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

[route] layers = -4

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 8

[convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 3,4,5 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263

anchors = 4,7, 7,15, 13,25, 25,42, 41,67, 75,94, 91,162, 158,205, 250,332

classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

[route] layers = -3

[convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky

[upsample] stride=2

[route] layers = -1, 6

[convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] mask = 0,1,2 anchors = 10,13, 18,25, 33,23, 31,58, 62,45, 59,119, 116,90, 156,198, 313,263

anchors = 4,7, 7,15, 13,25, 25,42, 41,67, 75,94, 91,162, 158,205, 250,332

classes=80 num=9 jitter=.3 ignore_thresh = .5 truth_thresh = 1 random=1

kame-lqm commented 4 years ago

Hi Alexey, I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?

Traininig LOG:

detections_count = 1148086, unique_truth_count = 106417 class_id = 0, name = pedestrian, ap = 10.24% (TP = 2580, FP = 5155) class_id = 1, name = people, ap = 26.38% (TP = 3212, FP = 2791) # class id from dataset 'WiderPerson', and other class ids are all from dataset 'Visdrone2018' class_id = 2, name = bicycle, ap = 2.19% (TP = 23, FP = 57) class_id = 3, name = car, ap = 57.78% (TP = 27420, FP = 15531) class_id = 4, name = van, ap = 28.83% (TP = 1794, FP = 1481) class_id = 5, name = truck, ap = 43.74% (TP = 1893, FP = 1000) class_id = 6, name = tricycle, ap = 4.56% (TP = 3, FP = 9) class_id = 7, name = awning-tricycle, ap = 3.54% (TP = 0, FP = 1) class_id = 8, name = bus, ap = 31.48% (TP = 360, FP = 219) class_id = 9, name = motor, ap = 9.29% (TP = 917, FP = 1936) class_id = 10, name = bulldozer, ap = 83.49% (TP = 94, FP = 21) class_id = 11, name = 12, ap = 31.54% (TP = 0, FP = 0) class_id = 12, name = 13, ap = 0.00% (TP = 0, FP = 0) class_id = 13, name = 14, ap = 0.00% (TP = 0, FP = 0)

data.names:

pedestrian people bicycle car van truck tricycle awning-tricycle bus motor bulldozer 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80

AlexeyAB commented 4 years ago

Is there anyone know what happen?


I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?

Mistake is in your dataset/coco.names.

kame-lqm commented 4 years ago

Is there anyone know what happen?

  • Because you trained model on 544x544, so the best result will be for 544x544 or slightly higher.
  • Because you use yolov3-tiny_3l.cfg that has very low accuracy.

I seems weird at following log, it shows me "class_id = 11, name = 12, ap = 31.54%", but in fact, there no any bounding box as class name "12" in my dataset, so its ap should be 0%. Is it a bug?

Mistake is in your dataset/coco.names.

@AlexeyAB Thanks so much.

kame-lqm commented 4 years ago

@AlexeyAB I already found out what happen on this issue. When I used the source code "pjreddie/darknet" to test my cfg & weights, it gets the wrong result as I mentioned above. And When I used your source code to test them, The result is perfect.

CORRECT TEST RESULT test_result

Fetulhak commented 2 years ago

@kame-lqm I have trained two Yolov4 models. one using resolution 416x 416 and the other 512x512. however, the model with 512x512 has a lower mAP than the 416x416. It is confusing for me that it should be the opposite? the input images were all equal size 1008 x 1008. any help will be appreciated.

anchor generated at 416: 10, 10, 18, 8, 8, 18, 12, 12, 14, 14, 16, 15, 18, 18, 21, 21, 25, 25 ...............90.52% IOU