AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.72k stars 7.96k forks source link

Improve object detection - tiny version #3497

Open berserker opened 5 years ago

berserker commented 5 years ago

I'm following the advices as described in "How to improve object detection" but I'm using the "yolov3-tiny_3l" and I need to detect small objects. I have the following questions:

Thanks for your help!

AlexeyAB commented 5 years ago

Now I have the following mask values for the above anchors: first yolo layer: mask = 3,4,5,6,7,8 (this is because I suppose 1,189 to be > of 60x60...right?) second yolo layer: mask = 2 (this is because I suppose 1,78 to be > of 30x30...right?) third yolo layer: mask = 0,1

Yes, try this.

Change these lines: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_3l.cfg#L198-L202

to these:

[upsample]
stride=4

[route]
layers = -1, 4
berserker commented 5 years ago

Thanks for the support @AlexeyAB! Please can you elaborate how to compute [upsample] and [route] to improve small objects detection? I'd like to test other configurations too but I don't know how to update those values ​​accordingly.

alexanderfrey commented 5 years ago

@AlexeyAB Do you suggest to change the same lines

[upsample]
stride=4

[route]
layers = -1, 4

For the yolov3_tiny_pan_lstm.cfg to imrove small objects detection ?

AlexeyAB commented 5 years ago

@alexanderfrey Yes

alexanderfrey commented 5 years ago

@alexanderfrey Yes

When I do this for the last upsampling layer I receive the following error:

51 Layer before convolutional layer must output image.: File exists
darknet: ./src/utils.c:293: error: Assertion `0' failed.
AlexeyAB commented 5 years ago

@alexanderfrey You doing something wrong.

In any cases, it doesn't make sanse for PAN network, since PAN-block already do this.

berserker commented 5 years ago

Now I have the following mask values for the above anchors: first yolo layer: mask = 3,4,5,6,7,8 (this is because I suppose 1,189 to be > of 60x60...right?) second yolo layer: mask = 2 (this is because I suppose 1,78 to be > of 30x30...right?) third yolo layer: mask = 0,1

Yes, try this.

I got a very low mAP with calculated anchors :( (with the "default" configuration I reach ~60%): chart

Change these lines: https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_3l.cfg#L198-L202

to these:

[upsample]
stride=4

[route]
layers = -1, 4

Default anchors with this change give me +10%! Here it is the current chart (still training...): chart_new

The project's target is to reach at least 90% of confidence and, once this new train will be completed, I need to figure out how to improve further the mAP. Do you think that one of these ideas could improve mAP (considering that I can "render" the input images with Blender)?

  1. Remove any intersection/overlap in tagged images: I have lots of samples with tagged regions that "overlap" or are very close each other as in the following snapshop (the 2 regions in the bottom/left): overlap Do you think that this case interferes with the train? I can detect this situation ad exclude the input image from the dataset.
  2. Since I'm automatically tagging the source images (I know where I create an "object" once I render the image with Blender) I'm exactly defining the regions at the relative pixels as you can see below: Tagged object (with displayed tagged region): tagged Same tagged object (without displayed tagged region): plain Do you think that inflating each tagged region with +1 pixel on each rect's side could improve mAP?
  3. I'm training my model with jpeg images (Blender's output) and as you can see in the 2 above screenshots the "contours" of the tagged object have a lot of "noise" (probably due to jpeg compression). Do you think that using png images instead of jpegs could improve mAP? I'm asking because it requires a LOT of time to regenerate my dataset...I read that jpeg compression doesn't influence in general machine learning but in my case the objects are very small and maybe the "noise" around the object could make a big difference.
  4. When I render my objects in Blender there is a procedure to find out if a tagged object is "relevant" or not (if not I remove it from the rendered image, there isn't any object not tagged in my dataset as far as I know) but this procedure caused my dataset to contains lots of "empty" images. I read in the advices that "empty" images are fine to improve objects detection but my dataset (~250k images) contains ~15% of empty images: do you think that a lot of these images could decrease the mAP?

Thanks again for the support!!

AlexeyAB commented 5 years ago

So use default anchors. Just may be decreasey 2x-4x width of anchors, without chaning masks.

berserker commented 5 years ago

So use default anchors. Just may be decreasey 2x-4x width of anchors, without chaning masks.

Thanks, I'll try that 👍 Do you have any hint of my last 4 questions?