Improve object detection - tiny version

berserker commented 5 years ago

I'm following the advices as described in "How to improve object detection" but I'm using the "yolov3-tiny_3l" and I need to detect small objects. I have the following questions:

The hint "layers = -1, 11" and "stride=4" refer to the full yolo v3 version, what are the suggested values (or even better the formula) to detect small objects in the "yolov3-tiny_3l" configuration? There is no [route] with layers = -1, 36 in this configuration :(
I calculated the anchors with the command "darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 1024 -height 1024" (I have only 1 class, I'm still working on this project) and this is the result: anchors = 1,28, 2,39, 1,78, 1,189, 2,132, 1,427, 2,365, 1,934, 2,915 I'm not sure about how to change mask according the advice "you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining.": do 60x60 and 30x30 refer to the total "size" of the anchor? i.e.: is the value 2,915 > 60x60? Now I have the following mask values for the above anchors: first yolo layer: mask = 3,4,5,6,7,8 (this is because I suppose 1,189 to be > of 60x60...right?) second yolo layer: mask = 2 (this is because I suppose 1,78 to be > of 30x30...right?) third yolo layer: mask = 0,1

Thanks for your help!

AlexeyAB commented 5 years ago

Now I have the following mask values for the above anchors: first yolo layer: mask = 3,4,5,6,7,8 (this is because I suppose 1,189 to be > of 60x60...right?) second yolo layer: mask = 2 (this is because I suppose 1,78 to be > of 30x30...right?) third yolo layer: mask = 0,1

Yes, try this.

Change these lines: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_3l.cfg#L198-L202

to these:

[upsample]
stride=4

[route]
layers = -1, 4

berserker commented 5 years ago

Thanks for the support @AlexeyAB! Please can you elaborate how to compute [upsample] and [route] to improve small objects detection? I'd like to test other configurations too but I don't know how to update those values accordingly.

alexanderfrey commented 5 years ago

@AlexeyAB Do you suggest to change the same lines

[upsample]
stride=4

[route]
layers = -1, 4

For the yolov3_tiny_pan_lstm.cfg to imrove small objects detection ?

AlexeyAB commented 5 years ago

@alexanderfrey Yes

alexanderfrey commented 5 years ago

@alexanderfrey Yes

When I do this for the last upsampling layer I receive the following error:

51 Layer before convolutional layer must output image.: File exists
darknet: ./src/utils.c:293: error: Assertion `0' failed.

AlexeyAB commented 5 years ago

@alexanderfrey You doing something wrong.

In any cases, it doesn't make sanse for PAN network, since PAN-block already do this.

berserker commented 5 years ago

Now I have the following mask values for the above anchors: first yolo layer: mask = 3,4,5,6,7,8 (this is because I suppose 1,189 to be > of 60x60...right?) second yolo layer: mask = 2 (this is because I suppose 1,78 to be > of 30x30...right?) third yolo layer: mask = 0,1

Yes, try this.

I got a very low mAP with calculated anchors :( (with the "default" configuration I reach ~60%): chart

Change these lines: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_3l.cfg#L198-L202

to these:
[upsample]
stride=4

[route]
layers = -1, 4
Default anchors with this change give me +10%! Here it is the current chart (still training...):

The project's target is to reach at least 90% of confidence and, once this new train will be completed, I need to figure out how to improve further the mAP. Do you think that one of these ideas could improve mAP (considering that I can "render" the input images with Blender)?

Remove any intersection/overlap in tagged images: I have lots of samples with tagged regions that "overlap" or are very close each other as in the following snapshop (the 2 regions in the bottom/left): Do you think that this case interferes with the train? I can detect this situation ad exclude the input image from the dataset.
Since I'm automatically tagging the source images (I know where I create an "object" once I render the image with Blender) I'm exactly defining the regions at the relative pixels as you can see below: Tagged object (with displayed tagged region): Same tagged object (without displayed tagged region): Do you think that inflating each tagged region with +1 pixel on each rect's side could improve mAP?
I'm training my model with jpeg images (Blender's output) and as you can see in the 2 above screenshots the "contours" of the tagged object have a lot of "noise" (probably due to jpeg compression). Do you think that using png images instead of jpegs could improve mAP? I'm asking because it requires a LOT of time to regenerate my dataset...I read that jpeg compression doesn't influence in general machine learning but in my case the objects are very small and maybe the "noise" around the object could make a big difference.
When I render my objects in Blender there is a procedure to find out if a tagged object is "relevant" or not (if not I remove it from the rendered image, there isn't any object not tagged in my dataset as far as I know) but this procedure caused my dataset to contains lots of "empty" images. I read in the advices that "empty" images are fine to improve objects detection but my dataset (~250k images) contains ~15% of empty images: do you think that a lot of these images could decrease the mAP?

Thanks again for the support!!

AlexeyAB commented 5 years ago

So use default anchors. Just may be decreasey 2x-4x width of anchors, without chaning masks.

berserker commented 5 years ago

So use default anchors. Just may be decreasey 2x-4x width of anchors, without chaning masks.

Thanks, I'll try that 👍 Do you have any hint of my last 4 questions?

AlexeyAB / darknet

Improve object detection - tiny version #3497