Open xuesongle opened 6 years ago
@xuesongle Hi,
For small objects much better to use Yolo v3 or Yolo v3 tiny.
Also - General rule - you should keep relative size of objects in the Training and Testing datasets the same: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width
train_network_height train_obj_height / train_image_height ~= detection_network_height detection_obj_height / detection_image_height
The small object size varies from 20x20 to 100x100. The size of large image is at 3840x2160. ... step 1: I cropped large images into smaller ones for training purpose, the size of cropped images vary from 245x199 to 1071x1005 ... step 3: I stopped training after 2000 iteration, then I tested with a 3840x2160 image. None of the small vehicles is detected. Only two large buildings (square size ) are detected.
So the smallest object size ~20x20. Also network size 416x416, image 3840x2160, and cropped image 245x199:
train_network_width * train_obj_width / train_image_width
= 416 * 20 / 245
~= 34detection_network_width * detection_obj_width / detection_image_width
= 416 * 20 / 3840
~= 2I.e. 2
much lower than 34
- neural network can't detect in this case.
I recommend you:
use yolov3.cfg
instead of yolo v2
set layers = -1, 11
instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720
and set stride=4
instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717
set width=832 height=832 and random=1
in cfg-file and train it (if GPU-memory size allow it), what GPU do you use?
Thanks, @AlexeyAB I will give a try and update the result. My gpu is Quadro P4000.
My gpu is Quadro P4000.
Try to set:
batch=64 subdivisions=64 width=832 height=832
and random=1
in each of 3 [yolo]-layers.
If the error Out of memory occurs, then try to use width=608 height=608
@AlexeyAB , training is still continuing around 1800 iterations now (8 hours now). I noticed that the following messages appeared on the screen: Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000020, .5R: -nan, .75R: -nan, count: 0 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000007, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: 0.783223, Class: 0.996343, Obj: 0.561249, No Obj: 0.000711, .5R: 1.000000, .75R: 1.000000, count: 3 Region 106 Avg IOU: 0.757034, Class: 0.999593, Obj: 0.886043, No Obj: 0.000173, .5R: 1.000000, .75R: 0.375000, count: 8 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000005, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001196, .5R: -nan, .75R: -nan, count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000220, .5R: -nan, .75R: -nan, count: 0 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000412, .5R: -nan, .75R: -nan, count: 0 Region 106 Avg IOU: 0.822524, Class: 0.950795, Obj: 0.565393, No Obj: 0.000300, .5R: 1.000000, .75R: 0.875000, count: 16
Is that normal or I should kill the training process and check any error?
Would you explain why layers = -1, 11
and stride=4
help detect small objects? @AlexeyAB
thx.
@AlexeyAB , I stopped the training at 2000 iterations and used the trained weights to test the same image at 3840x2160. None of vehicles can be detected. So I cropped the image to 1524x696, only 1 sedan is detected. Then I tested with smaller cropped image at 736x758, half of the vehicles can be detected. Testing with another cropped image at 422x331, more vehicles are detected. It seems that current trained model works for detecting small objects from small images only.
The following are the changes I have made in training small objects with Yolov3. Step 1: cd cfg; cp yolov3.cfg yolov3-uav.cfg Step 2: in yolov3-uav.cfg files, the following changes are made batch=64 at line 3 subdivisions=64 at line 4 width=608 at line 8 height=608 at line 9 random=1 in each of 3 [yolo]-layers., line 615, line 701,line 788 layers = -1, 11 at line 720 stride=4 at line 717 classes=4 as only 4 classes in my case in each of 3 [yolo]-layers: at line 610,696,783 change [filters=255] to filters=27 in the 3 [convolutional] before each [yolo] layer at line 603, 689, 776 Step 3: Download pre-trained weights for the convolutional layers: wget https://pjreddie.com/media/files/darknet53.conv.74 train the model for 2000 iterations: ./darknet detector train cfg/uav.data cfg/yolov3-uav.cfg darknet53.conv.74 Ste 4: Then test with images at different resolutions, e.g ./darknet detector test cfg/uav.data cfg/yolov3-uav.cfg yolov3-uav_2000.weights data/uav_1.jpg
No vehicles detected in 3840x2160. 1 sedan detected in 1524x696 50% vehicles detected in 736x758 90% vehicles detected in 422x331
Is there anything I need to change or try, so small objects can be detected from large images? Currently it seems that only solution is to break large images into smaller ones and detect them individually, then put them back to the large one.
@panda9095, The way I understood that this works is because - From the fact that as the depth of the network increases, the semantic value increases while the resolution of the feature maps decreases. So by concatenating the early layers's feature maps (where resolution size is high) with later layers's semantically rich feature maps helps detect smaller objects ( kind of a pyramid network). To concatenate 11th layer of output size 104x104x128, the stride of the previous layer has to be increased 4 times to match the output dimensions of the 11th layer.
@xuesongle i'm interesting about small objects too. In my case i wanna detect traffic sign. I have images with size ~1280x720. tried train on yolov3 with batch and for a very long time I received -nan, about 2000 iterations. The result is unfortunately not pleased, I think there can be something wrong with the data. Images are large or vice versa
thx! @kmsravindra
hi, @AlexeyAB I want to detect small faces from high resolution image, but in public dataset, train_obj_height / train_image_height usually much higher.
In training stage, if I paste public dataset image to a high resolution image at random position, then use this high resolution image to resize to 416*416 to do trainning, do you think it works?
@shengxingdong Hi, yes I think it will work. Just make correct annotations of objects positions.
@AlexeyAB thank you , I will try it.
If you change to layers = -1, 11 and stride=4 for small object detection are there any considerations that need to be taken into account as far as anchors go if I'm using my own anchors, 9, 3 for each yolo layer generated with Alexy's script.
Hi @AlexeyAB ,
Would you still recommend Yolov3 over YoloV4 for detecting small objects on top-down images?
Thanks
@AlexeyAB : Hi:
I attempted to detect small vehicles from large images (taken from unmanned aerial vehicle). The small object size varies from 20x20 to 100x100. The size of large image is at 3840x2160.
I used the following steps to detect small objects here: step 1: I cropped large images into smaller ones for training purpose, the size of cropped images vary from 245x199 to 1071x1005, then annotated small objects in each cropped images. step 2: Then I followed the instructions on you site to train those annotated cropped images. I copied the existing yolo-voc.2.0.cfg and edit it: set classes=4, the number of categories we want to detect set filters=(classes + 5)*5 in this case filters=45 step 3: I stopped training after 2000 iteration, then I tested with a 3840x2160 image. None of the small vehicles is detected. Only two large buildings (square size ) are detected.
I think there are a number of changes I need to make, can you shed a light on this problem. Thanks. Two of the annotated training images and one testing image are attached here.
Thanks.