AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

View anchors and model cfg #2338

Open GustavoAndresMoreno opened 5 years ago

GustavoAndresMoreno commented 5 years ago

Hi @AlexeyAB. I am new to this forums. I have been learning YOLO alone, reading and reviewing different tutorials and forums. I'm working on detection of two-class vehicles, small cars and small and large trucks (View images attached) [images_Cars_Trucks.zip] https://github.com/AlexeyAB/darknet/files/2819995/images_Cars_Trucks.zip) You can see in the images that the model does not label large trucks well. I am using YoloV3 Tiny and this text file cfg cars_cfg.zip

I used darknet.exe detector calc_anchors EntrenaCars/code/SS/Train/cars.data -num_of_clusters 8 -width 480 -height 480 -show. I got the next anchors:

num_of_clusters = 8, width = 480, height = 480. read labels from 5966 images loaded
image: 5966 box: 6746 all loaded. calculating k-means++ ... avg IoU = 86.84 % . Saving anchors to the file: anchors.txt anchors = 30, 61, 39, 59, 36, 87, 41,109, 46,136, 52,171, 58,230, 65,367

The next image show cluster anchors: ![clusters anchors_8] (https://user-images.githubusercontent.com/47233592/52101794-2f9d8b00-25ab-11e9-86a4-96f681864838.PNG)

How can I make my model consider the biggest trucks? What should I modify in my cfg file? What do you recommend in your experience?

Thanks @AlexeyAB

Sudhakar17 commented 5 years ago

@GustavoAndresMoreno What's your model accurracy mAP? How many samples do u have for each class? Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320*320 (also make sure that car is visible). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set.

AlexeyAB commented 5 years ago

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt

  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer: cars12_2.cfg.txt

GustavoAndresMoreno commented 5 years ago

1). What's your model accurracy mAP: This is fine: calculation mAP (mean average precision)... 5968 detections_count = 6975, unique_truth_count = 6746 class_id = 0, name = car, ap = 99.88 % class_id = 1, name = truck, ap = 99.99 % for thresh = 0.70, precision = 1.00, recall = 0.99, F1-score = 1.00 for thresh = 0.70, TP = 6712, FP = 16, FN = 34, average IoU = 89.66 % mean average precision (mAP) = 0.999314, or 99.93 %.

The model works very well, the problem is in the label that it generates for large trucks. I need you to cover the truck in order to determine its size and differentiate it from others

2). Anchor box is not fitting properly for the big truck. You can reduce the resolution to 320320 (also make sure that car is visible): If the resolution is 320320 the model loss precision, so the resolution in 480*480 is good.

3). Probably, you have less training examples of big truck which can affect the anchor box co-ordinates and try with different number of clusters as well. You can visualize with the anchor co-ordinates whether it can fit the objects in the training set: This can be one of the problems I have. The number of very large trucks is very low compared to the other vehicles. However, it is difficult to have a greater number of samples, I will try to obtain them.

Thanks @Sudhakar17

GustavoAndresMoreno commented 5 years ago

@GustavoAndresMoreno Hi,

The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer: cars12_2.cfg.txt

Hi @AlexeyAB. I will review the models that you recommend and have a feedback of the results. Thank you.

GustavoAndresMoreno commented 5 years ago

Hi, Sorry. I closed the issue by mistake.

GustavoAndresMoreno commented 5 years ago

@GustavoAndresMoreno Hi, The main thing to check that there are enough images with large trucks that are marked as you need in the Training dataset.

  1. Try to train from the begining by using this cfg-file - I added 1 anchor to the 1st yolo-layer (changed anchors, num, filters): cars12_1.cfg.txt
  2. If it doesn't help, then try to train from the begining by using this cfg-file - also I added 2 conv-layers before 1st yolo-layer: cars12_2.cfg.txt

Hi @AlexeyAB. I will review the models that you recommend and have a feedback of the results. Thank you.

Hi @AlexeyAB. The result of the model in accuracy is very similar to the previous one. It is not yet possible to label the truck completely. What other process could I do? If I use YoloV3, could the result improve?

Thanks @AlexeyAB

AlexeyAB commented 5 years ago

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

GustavoAndresMoreno commented 5 years ago

@GustavoAndresMoreno You should add much more examples with full truck to your training dataset.

If I use YoloV3, could the result improve?

Yes.

Ok @AlexeyAB. I add more examples and I will try with Yolo V3. Thank You.

GustavoAndresMoreno commented 5 years ago

Hi @AlexeyAB and @Sudhakar17,

In the cfg model the Yolo layers how they connect with the upper convolutional layers. For example, in the cfg model I use the last layer of Yolo for small objects with the mask 0,1,2. And for large objects I use the first layer of Yolo with the mask 3,4,5,6,7,8. What convolutional layers should I modify so that the model can better recognize small objects and large objects?

I hope you can understand my question.

Thank you.

AlexeyAB commented 5 years ago

@GustavoAndresMoreno As described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects filters=(classes + coords + 1)*<number of mask>

So you should set filters=(classes + coords + 1)*6 in the [convolutional] layer before the 1st yolo-lyaer (where are mask 3,4,5,6,7,8). And filters=(classes + coords + 1)*3 in the [convolutional] layer before the last yolo-lyaer (where are mask 0,1,2).


Also: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

Sudhakar17 commented 5 years ago

layer filters size input output 0 conv 32 3 x 3 / 1 416 x 416 x 3 -> 416 x 416 x 32 0.299 BF 1 conv 64 3 x 3 / 2 416 x 416 x 32 -> 208 x 208 x 64 1.595 BF 2 conv 32 1 x 1 / 1 208 x 208 x 64 -> 208 x 208 x 32 0.177 BF 3 conv 64 3 x 3 / 1 208 x 208 x 32 -> 208 x 208 x 64 1.595 BF 4 Shortcut Layer: 1 5 conv 128 3 x 3 / 2 208 x 208 x 64 -> 104 x 104 x 128 1.595 BF 6 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 7 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 8 Shortcut Layer: 5 9 conv 64 1 x 1 / 1 104 x 104 x 128 -> 104 x 104 x 64 0.177 BF 10 conv 128 3 x 3 / 1 104 x 104 x 64 -> 104 x 104 x 128 1.595 BF 11 Shortcut Layer: 8 12 conv 256 3 x 3 / 2 104 x 104 x 128 -> 52 x 52 x 256 1.595 BF 13 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 14 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 15 Shortcut Layer: 12 16 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 17 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 18 Shortcut Layer: 15 19 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 20 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 21 Shortcut Layer: 18 22 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 23 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 24 Shortcut Layer: 21 25 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 26 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 27 Shortcut Layer: 24 28 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 29 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 30 Shortcut Layer: 27 31 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 32 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 33 Shortcut Layer: 30 34 conv 128 1 x 1 / 1 52 x 52 x 256 -> 52 x 52 x 128 0.177 BF 35 conv 256 3 x 3 / 1 52 x 52 x 128 -> 52 x 52 x 256 1.595 BF 36 Shortcut Layer: 33 37 conv 512 3 x 3 / 2 52 x 52 x 256 -> 26 x 26 x 512 1.595 BF 38 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 39 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 40 Shortcut Layer: 37 41 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 42 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 43 Shortcut Layer: 40 44 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 45 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 46 Shortcut Layer: 43 47 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 48 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 49 Shortcut Layer: 46 50 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 51 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 52 Shortcut Layer: 49 53 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 54 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 55 Shortcut Layer: 52 56 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 57 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 58 Shortcut Layer: 55 59 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 60 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 61 Shortcut Layer: 58 62 conv 1024 3 x 3 / 2 26 x 26 x 512 -> 13 x 13 x1024 1.595 BF 63 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 64 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 65 Shortcut Layer: 62 66 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 67 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 68 Shortcut Layer: 65 69 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 70 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 71 Shortcut Layer: 68 72 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 73 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 74 Shortcut Layer: 71 75 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 76 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 77 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 78 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 79 conv 512 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 512 0.177 BF 80 conv 1024 3 x 3 / 1 13 x 13 x 512 -> 13 x 13 x1024 1.595 BF 81 conv 21 1 x 1 / 1 13 x 13 x1024 -> 13 x 13 x 21 0.007 BF 82 yolo 83 route 79 84 conv 256 1 x 1 / 1 13 x 13 x 512 -> 13 x 13 x 256 0.044 BF 85 upsample 2x 13 x 13 x 256 -> 26 x 26 x 256 86 route 85 61 87 conv 256 1 x 1 / 1 26 x 26 x 768 -> 26 x 26 x 256 0.266 BF 88 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 89 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 90 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 91 conv 256 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 256 0.177 BF 92 conv 512 3 x 3 / 1 26 x 26 x 256 -> 26 x 26 x 512 1.595 BF 93 conv 21 1 x 1 / 1 26 x 26 x 512 -> 26 x 26 x 21 0.015 BF 94 yolo 95 route 91 96 conv 128 1 x 1 / 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF 97 upsample 4x 26 x 26 x 128 -> 104 x 104 x 128 98 route 97 11 99 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF 100 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF 101 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF 102 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF 103 conv 128 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 128 0.709 BF 104 conv 256 3 x 3 / 1 104 x 104 x 128 -> 104 x 104 x 256 6.380 BF 105 conv 21 1 x 1 / 1 104 x 104 x 256 -> 104 x 104 x 21 0.116 BF 106 yolo

from the output of yolo-v3, we can see that the smaller objects can be detected with the larger feature map of size(104*104). So you can upsample the intermediate(conv layers) stages before yolo as well. Even though we followed this approach, sometimes we need min size(pixels) to detect an object.