AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Can my dataset have blank classes? #847

Closed offchan42 closed 6 years ago

offchan42 commented 6 years ago

For example, if my .names file contains a bunch of words like cat, dog, and rabbit. But the actual dataset contains only images of cats and rabbits. No dogs. Would this make the mAP calculation or training process erroneous?

I've 133 classes available but my dataset only have like maybe 80 classes in actual images. Afer training for a while (5000+ iterations), the mAP value is very low 0.00003 something, (avg loss stuck around 0.6) so I'm suspecting this might cause the mAP to be too low. And also, it does not predict anything at all. There is no bounding boxes (thresh 0.25). Could you also suggest what to do when the loss does not go any lower (and still not predicting anything)? Is there some guideline or troubleshooting checklist when loss doesn't improve? Is it possible that mAP will increase soon? Or do I need to change the model config?

I'm kind of desperate right now. I tried changing the learning rate from 1e-3, 1e-4, 1e-5, 1e-6, to 1e-7 and non of them make the loss go lower than 0.6.

offchan42 commented 6 years ago

Another question: Why are there no bounding boxes for all classes? (This is incorrect prediction but that's irrelevant) image

1: 32%  (left: 6    top: 5  w: 2    h: 4)
2: 32%
4: 27%
6: 28%

From the predicted -ext_output option, only 1 has bounding box, but the other number predictions don't have its own left,top,w,h stuff. Why don't they all have associated left,top,w,h values?

kmsravindra commented 6 years ago

@off99555 , Try having the same names that are present in your dataset and remove the extra ones. I encountered a similar issue with darkflow (not darknet) and it helped to resolve that issue. You can give it a try...

offchan42 commented 6 years ago

Do you mean removing classes from my .names file? I'll try doing it and I'll report what happens.

offchan42 commented 6 years ago

I removed the extra classes that are unused in the training set from the .names file and recalculated all the order of classes in *.txt annotated files. Remaining classes = 68 classes. Stil, it's not improving my model...

AlexeyAB commented 6 years ago

I've 133 classes available but my dataset only have like maybe 80 classes in actual images. Afer training for a while (5000+ iterations), the mAP value is very low 0.00003 something, (avg loss stuck around 0.6)

Avg loss is good. You can see at the AP for each class. If AP > 50%, then it's all OK.

AlexeyAB commented 6 years ago

From the predicted -ext_output option, only 1 has bounding box, but the other number predictions don't have its own left,top,w,h stuff. Why don't they all have associated left,top,w,h values?

Thats how was adeed here for multilabel detection: https://github.com/AlexeyAB/darknet/pull/741

offchan42 commented 6 years ago

@AlexeyAB I don't have AP over 50%. This is the result of mAP on the test set after 25,000 iterations.

Detections_count = 17401, unique_truth_count = 4809
class_id = 0, name = 0,  17401   ap = 0.20 %
class_id = 1, name = 1,          ap = 0.00 %
class_id = 2, name = 2,          ap = 0.38 %
class_id = 3, name = 3,          ap = 0.47 %
class_id = 4, name = 4,          ap = 0.00 %
class_id = 5, name = 5,          ap = 0.07 %
class_id = 6, name = 6,          ap = 0.08 %
class_id = 7, name = 7,          ap = 1.01 %
class_id = 8, name = 8,          ap = 0.16 %
class_id = 9, name = 9,          ap = 0.52 %
class_id = 10, name = char,      ap = 0.00 %
class_id = 11, name = 01-ko-kai,         ap = 0.01 %
class_id = 12, name = 02-kho-khai,       ap = 0.00 %
class_id = 13, name = 03-kho-khuat,      ap = 0.00 %
class_id = 14, name = 04-kho-khwai,      ap = 0.00 %
class_id = 15, name = 06-kho-ra-khang,   ap = 0.00 %
class_id = 16, name = 07-ngo-ngu,        ap = 0.00 %
class_id = 17, name = 08-cho-chan,       ap = 1.05 %
class_id = 18, name = 09-cho-ching,      ap = 0.32 %
class_id = 19, name = 10-cho-chang,      ap = 0.00 %
class_id = 20, name = 12-cho-choe,       ap = 0.00 %
class_id = 21, name = 13-yo-ying,        ap = 0.00 %
class_id = 22, name = 14-do-cha-da,      ap = 0.19 %
class_id = 23, name = 15-to-pa-tak,      ap = 0.00 %
class_id = 24, name = 16-tho-than,       ap = 0.00 %
class_id = 25, name = 18-tho-phu-thao,   ap = 0.00 %
class_id = 26, name = 19-no-nen,         ap = 0.00 %
class_id = 27, name = 20-do-dek,         ap = 0.00 %
class_id = 28, name = 21-to-tao,         ap = 1.30 %
class_id = 29, name = 22-tho-thung,      ap = 0.00 %
class_id = 30, name = 23-tho-thahan,     ap = 0.00 %
class_id = 31, name = 24-tho-thong,      ap = 0.91 %
class_id = 32, name = 25-no-nu,          ap = 0.00 %
class_id = 33, name = 26-bo-baimai,      ap = 0.00 %
class_id = 34, name = 27-po-pla,         ap = 0.00 %
class_id = 35, name = 28-pho-phueng,     ap = 0.00 %
class_id = 36, name = 30-pho-phan,       ap = 0.00 %
class_id = 37, name = 31-fo-fan,         ap = 0.00 %
class_id = 38, name = 32-pho-sam-phao,   ap = 0.00 %
class_id = 39, name = 33-mo-ma,          ap = 0.00 %
class_id = 40, name = 34-yo-yak,         ap = 0.00 %
class_id = 41, name = 35-ro-ruea,        ap = 0.08 %
class_id = 42, name = 36-lo-ling,        ap = 0.00 %
class_id = 43, name = 37-wo-waen,        ap = 0.00 %
class_id = 44, name = 38-so-sala,        ap = 0.00 %
class_id = 45, name = 39-so-rue-si,      ap = 0.00 %
class_id = 46, name = 40-so-suea,        ap = 1.01 %
class_id = 47, name = 41-ho-hip,         ap = 0.00 %
class_id = 48, name = 42-lo-chu-la,      ap = 0.00 %
class_id = 49, name = 43-o-ang,          ap = 0.36 %
class_id = 50, name = 44-ho-nok-huk,     ap = 0.00 %
class_id = 51, name = changwat,          ap = 0.07 %
class_id = 52, name = Bangkok,   ap = 4.55 %
class_id = 53, name = Chachoengsao,      ap = 0.00 %
class_id = 54, name = Chanthaburi,       ap = 0.00 %
class_id = 55, name = Chiang-Mai,        ap = 0.00 %
class_id = 56, name = Chon-Buri,         ap = 1.24 %
class_id = 57, name = Kamphaeng-Phet,    ap = 0.00 %
class_id = 58, name = Khon-Kaen,         ap = 0.00 %
class_id = 59, name = Lop-Buri,          ap = 0.00 %
class_id = 60, name = Nakhon-Nayok,      ap = 0.00 %
class_id = 61, name = Nakhon-Ratchasima,         ap = 0.00 %
class_id = 62, name = Phetchabun,        ap = 0.00 %
class_id = 63, name = Rayong,    ap = 0.00 %
class_id = 64, name = Si-Sa-Ket,         ap = 0.00 %
class_id = 65, name = Trat,      ap = 0.00 %
class_id = 66, name = Udon-Thani,        ap = 0.00 %
class_id = 67, name = Uthai-Thani,       ap = 0.00 %
 for thresh = 0.25, precision = 0.02, recall = 0.00, F1-score = 0.00
 for thresh = 0.25, TP = 12, FP = 491, FN = 4797, average IoU = 1.42 %

 mean average precision (mAP) = 0.002055, or 0.21 %
Total Detection Time: 7.000000 Seconds

Can I hope for good result if I keep training? Or do you think there is a bug at this point? I set my learning rate = 0.00075. Loss is around 0.5. Input size 128x128. Before this, I've trained a model to detect only one class and it required only 1400 iterations to has mAP over 65% so I'm not sure if this time it's working with 68 classes.

AlexeyAB commented 6 years ago

Something goes wrong. Check all params (anchors, filter, classes) in the cfg-file and check your dataset. All your images do have a size of 128x128? Do you use yolov3-tiny?

Do you mean removing classes from my .names file? I'll try doing it and I'll report what happens.

Did you start training from the begining after these changes?

offchan42 commented 6 years ago

@AlexeyAB I believed something went wrong. But I couldn't figure that out. My images are small square shape like 98x98, 150x150, etc. 128x128 is expected to be resized by the model They are images of license plates. Anchors are set from the command darknet detector calc_anchors data/ch.data -num_of_clusters 6 -width 128 -height 128 I got 0.8863,1.9025, 1.2476,2.7584, 1.7519,3.8749, 3.9772,2.0495, 2.3599,5.4580, 8.3901,3.5298 as a result. I replaced these anchors 2 places in my cfg file. I set the last mask=0,1,2 I set classes=68 both places. Both filters before classes are set to 219. I use yolov3-tiny. I start training from the beginning. I didn't use any pre-trained weights. I couldn't even fit a dataset of 3 images or 64 images as described in the other issue #850.

offchan42 commented 6 years ago

I also think my dataset is fine. I randomly open .txt file and check if it's matched with the names file and it's matched with the image. Example of my txt file:

4 0.008 0.033 0.005 0.019
11 0.013 0.031 0.007 0.015
29 0.02 0.03 0.009 0.018
2 0.027 0.028 0.009 0.018
6 0.034 0.029 0.008 0.016
1 0.038 0.027 0.006 0.022
8 0.043 0.029 0.006 0.018
52 0.028 0.042 0.037 0.013

Here's the cfg file (also set random=0):

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=2
width=128
height=128
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.0075
burn_in=1000
max_batches = 500200
policy=steps
steps=50000,70000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=219
activation=linear

[yolo]
mask = 3,4,5
#anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
anchors = 0.8863,1.9025, 1.2476,2.7584, 1.7519,3.8749, 3.9772,2.0495, 2.3599,5.4580, 8.3901,3.5298
classes=68
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=0

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=219
activation=linear

[yolo]
mask = 0,1,2
#anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
anchors = 0.8863,1.9025, 1.2476,2.7584, 1.7519,3.8749, 3.9772,2.0495, 2.3599,5.4580, 8.3901,3.5298
classes=68
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=0
AlexeyAB commented 6 years ago

@off99555

  1. Did you open your dataset in the Yolo_mark? https://github.com/AlexeyAB/Yolo_mark

  2. anchors = 0.8863,1.9025, 1.2476,2.7584, 1.7519,3.8749, 3.9772,2.0495, 2.3599,5.4580, 8.3901,3.5298

    • Looks like some of your objects have size less than 1 pixel 0.8x1.9 after that image resized to the 128x128. So such objects can't be detected.

    • 1.2476,2.7584, 1.7519,3.8749, 3.9772,2.0495, 2.3599,5.4580, 8.3901,3.5298 - these are objects that larger than 1x1 pixel but less than 32x32 pixels after that image resized to the 128x128. So these objects can be detected, but class_id will be badly recognized.

offchan42 commented 6 years ago

Totally I have around 2,400 training samples Most of my images are around 50x50 to 200x200, they are very small and there are probably no images larger than 256x256. The bounding box has to be bigger than 32x32 pixels after resize right? What does badly recognized mean? Does this mean I have to use 416x416? I think I've tried that before and it didn't work too.

offchan42 commented 6 years ago

I didn't use Yolo_mark software to annotate my dataset. I used labelImg and then use https://github.com/SsaRu/convert2Yolo tool to convert from labelImg format to yolo format.

offchan42 commented 6 years ago

I think the conversion tool works because I used it with a different dataset and I could get it to train properly. So I don't need to use Yolo_mark to open my images at all.

AlexeyAB commented 6 years ago

Example of my txt file:

4 0.008 0.033 0.005 0.019 11 0.013 0.031 0.007 0.015 29 0.02 0.03 0.009 0.018 2 0.027 0.028 0.009 0.018 6 0.034 0.029 0.008 0.016 1 0.038 0.027 0.006 0.022 8 0.043 0.029 0.006 0.018 52 0.028 0.042 0.037 0.013

Can you show this image with and without labled objects?

AlexeyAB commented 6 years ago

I think the conversion tool works because I used it with a different dataset and I could get it to train properly.

May be you are right. But this is one of the most common mistakes )

offchan42 commented 6 years ago

car-017-0 This is the image of the above annotation.

offchan42 commented 6 years ago

car-027-1

4 0.02 0.084 0.017 0.029
11 0.034 0.079 0.017 0.033
29 0.05 0.077 0.017 0.037
2 0.068 0.071 0.015 0.033
6 0.08 0.07 0.016 0.037
1 0.094 0.067 0.013 0.032
8 0.107 0.064 0.018 0.033
52 0.062 0.104 0.08 0.032

This is another one for you to see. A closer bigger image.

AlexeyAB commented 6 years ago
  1. No, plase show me smaller image
  2. Also your dataset is wrong - Look at the top left corner on the image that I pinned. All labels are in the top-left corner:

image image image

offchan42 commented 6 years ago

car-015-0

4 0.005 0.028 0.005 0.009
11 0.011 0.027 0.006 0.01
29 0.016 0.027 0.006 0.011
2 0.023 0.027 0.005 0.011
6 0.029 0.024 0.005 0.011
1 0.033 0.025 0.004 0.01
10 0.038 0.024 0.006 0.015

This is a smaller image.

AlexeyAB commented 6 years ago

Your image has a size 56x56, but width of objects are 0.005, so 0.005x56 = 0.28 that is smaller than 1 pixel. But as you can see on your image, width of objects bigger than 1 pixel. Your labels are incorrect.

You should always use Yolo_mark: https://github.com/AlexeyAB/Yolo_mark

offchan42 commented 6 years ago

I cannot afford to label the whole dataset again for sure. So conversion is my only option. Please also check this: car-026

0 0.474 0.541 0.115 0.084

I used the same conversion tool and this dataset can be detected without problems. There is only one class of license plate.

offchan42 commented 6 years ago

I got it now! See #850