detecting small objects from large image

xuesongle commented 6 years ago

@AlexeyAB : Hi:

I attempted to detect small vehicles from large images (taken from unmanned aerial vehicle). The small object size varies from 20x20 to 100x100. The size of large image is at 3840x2160.

I used the following steps to detect small objects here: step 1: I cropped large images into smaller ones for training purpose, the size of cropped images vary from 245x199 to 1071x1005, then annotated small objects in each cropped images. step 2: Then I followed the instructions on you site to train those annotated cropped images. I copied the existing yolo-voc.2.0.cfg and edit it: set classes=4, the number of categories we want to detect set filters=(classes + 5)*5 in this case filters=45 step 3: I stopped training after 2000 iteration, then I tested with a 3840x2160 image. None of the small vehicles is detected. Only two large buildings (square size ) are detected.

I think there are a number of changes I need to make, can you shed a light on this problem. Thanks. Two of the annotated training images and one testing image are attached here.

Thanks.

AlexeyAB commented 6 years ago

@xuesongle Hi,

For small objects much better to use Yolo v3 or Yolo v3 tiny.
Also - General rule - you should keep relative size of objects in the Training and Testing datasets the same: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
- train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width
- train_network_height train_obj_height / train_image_height ~= detection_network_height detection_obj_height / detection_image_height

The small object size varies from 20x20 to 100x100. The size of large image is at 3840x2160. ... step 1: I cropped large images into smaller ones for training purpose, the size of cropped images vary from 245x199 to 1071x1005 ... step 3: I stopped training after 2000 iteration, then I tested with a 3840x2160 image. None of the small vehicles is detected. Only two large buildings (square size ) are detected.

So the smallest object size ~20x20. Also network size 416x416, image 3840x2160, and cropped image 245x199:

you trained with train_network_width * train_obj_width / train_image_width = 416 * 20 / 245 ~= 34
but you tested with detection_network_width * detection_obj_width / detection_image_width = 416 * 20 / 3840 ~= 2

I.e. 2 much lower than 34 - neural network can't detect in this case.

I recommend you:

use yolov3.cfg instead of yolo v2
set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717
set width=832 height=832 and random=1 in cfg-file and train it (if GPU-memory size allow it), what GPU do you use?

xuesongle commented 6 years ago

Thanks, @AlexeyAB I will give a try and update the result. My gpu is Quadro P4000.

AlexeyAB commented 6 years ago

My gpu is Quadro P4000.

Try to set:

batch=64 subdivisions=64 width=832 height=832 and random=1 in each of 3 [yolo]-layers. If the error Out of memory occurs, then try to use width=608 height=608

xuesongle commented 6 years ago

@AlexeyAB , training is still continuing around 1800 iterations now (8 hours now). I noticed that the following messages appeared on the screen: Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000020, .5R: -nan, .75R: -nan, count: 0 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000007, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: 0.783223, Class: 0.996343, Obj: 0.561249, No Obj: 0.000711, .5R: 1.000000, .75R: 1.000000, count: 3 Region 106 Avg IOU: 0.757034, Class: 0.999593, Obj: 0.886043, No Obj: 0.000173, .5R: 1.000000, .75R: 0.375000, count: 8 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000005, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.001196, .5R: -nan, .75R: -nan, count: 0 Region 106 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000220, .5R: -nan, .75R: -nan, count: 0 Region 82 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan, count: 0 Region 94 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000412, .5R: -nan, .75R: -nan, count: 0 Region 106 Avg IOU: 0.822524, Class: 0.950795, Obj: 0.565393, No Obj: 0.000300, .5R: 1.000000, .75R: 0.875000, count: 16

Is that normal or I should kill the training process and check any error?

ghost commented 6 years ago

Would you explain why layers = -1, 11 and stride=4 help detect small objects? @AlexeyAB thx.

xuesongle commented 6 years ago

@AlexeyAB , I stopped the training at 2000 iterations and used the trained weights to test the same image at 3840x2160. None of vehicles can be detected. So I cropped the image to 1524x696, only 1 sedan is detected. Then I tested with smaller cropped image at 736x758, half of the vehicles can be detected. Testing with another cropped image at 422x331, more vehicles are detected. It seems that current trained model works for detecting small objects from small images only.

The following are the changes I have made in training small objects with Yolov3. Step 1: cd cfg; cp yolov3.cfg yolov3-uav.cfg Step 2: in yolov3-uav.cfg files, the following changes are made batch=64 at line 3 subdivisions=64 at line 4 width=608 at line 8 height=608 at line 9 random=1 in each of 3 [yolo]-layers., line 615, line 701,line 788 layers = -1, 11 at line 720 stride=4 at line 717 classes=4 as only 4 classes in my case in each of 3 [yolo]-layers: at line 610,696,783 change [filters=255] to filters=27 in the 3 [convolutional] before each [yolo] layer at line 603, 689, 776 Step 3: Download pre-trained weights for the convolutional layers: wget https://pjreddie.com/media/files/darknet53.conv.74 train the model for 2000 iterations: ./darknet detector train cfg/uav.data cfg/yolov3-uav.cfg darknet53.conv.74 Ste 4: Then test with images at different resolutions, e.g ./darknet detector test cfg/uav.data cfg/yolov3-uav.cfg yolov3-uav_2000.weights data/uav_1.jpg

No vehicles detected in 3840x2160. 1 sedan detected in 1524x696 50% vehicles detected in 736x758 90% vehicles detected in 422x331

Is there anything I need to change or try, so small objects can be detected from large images? Currently it seems that only solution is to break large images into smaller ones and detect them individually, then put them back to the large one.

kmsravindra commented 6 years ago

@panda9095, The way I understood that this works is because - From the fact that as the depth of the network increases, the semantic value increases while the resolution of the feature maps decreases. So by concatenating the early layers's feature maps (where resolution size is high) with later layers's semantically rich feature maps helps detect smaller objects ( kind of a pyramid network). To concatenate 11th layer of output size 104x104x128, the stride of the previous layer has to be increased 4 times to match the output dimensions of the 11th layer.

AshleyRoth commented 6 years ago

@xuesongle i'm interesting about small objects too. In my case i wanna detect traffic sign. I have images with size ~1280x720. tried train on yolov3 with batch and for a very long time I received -nan, about 2000 iterations. The result is unfortunately not pleased, I think there can be something wrong with the data. Images are large or vice versa

ghost commented 6 years ago

thx! @kmsravindra

shengxingdong commented 6 years ago

hi, @AlexeyAB I want to detect small faces from high resolution image, but in public dataset, train_obj_height / train_image_height usually much higher.

In training stage, if I paste public dataset image to a high resolution image at random position, then use this high resolution image to resize to 416*416 to do trainning, do you think it works?

AlexeyAB commented 6 years ago

@shengxingdong Hi, yes I think it will work. Just make correct annotations of objects positions.

shengxingdong commented 6 years ago

@AlexeyAB thank you , I will try it.

readicculus commented 5 years ago

If you change to layers = -1, 11 and stride=4 for small object detection are there any considerations that need to be taken into account as far as anchors go if I'm using my own anchors, 9, 3 for each yolo layer generated with Alexy's script.

ElHouas commented 2 years ago

Hi @AlexeyAB ,

Would you still recommend Yolov3 over YoloV4 for detecting small objects on top-down images?

Thanks

AlexeyAB / darknet

detecting small objects from large image #977