Network for small objects

mmartin56 commented 4 years ago

Hi @AlexeyAB,

I have a custom dataset that includes lots of small objects (20-30 pixels). I have tried training yolov3-spp with various input size: 416x416, 608x608, and even 704x704. There's a big improvement in mAP with these larger input images. However running darknet on these sizes is pretty slow, especially on jetson tx2i.

Now the good thing is, these objects have a simple shape. So although I need to keep as many pixels from the original image as possible, maybe not all of yolov3's layers/channels are necessary.

Are there other network architectures that are tailored to detecting small simple objects?

(I'm keeping initial weights in mind as well - a brand new, custom architecture means no initial weights right?)

Thanks in advance

AvenSun commented 4 years ago

@mmartin56 you can try yolov3-tiny_3l.cfg

BernoGreyling commented 4 years ago

In my comparisons for small object, yolov3_3l worked really well and is quick. For a bit more accuracy, I had more success with yolo_v3_tiny_pan3.cfg. from #3114 while it is slightly slower

davidscmx commented 4 years ago

Hi @AlexeyAB

Regarding this issue, would it possible to provide a .cfg similar to https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny-prn.cfg but which is optimized for BOTH small and large objects such as yolov3-tiny_3l.cfg. Cheers.

uday60 commented 4 years ago

@davidsosa I have already tested prn and tiny_3i models for small objects.... pan3 model is way ahead.

mmartin56 commented 4 years ago

Mmh I just tested yolov3-tiny_3l with 800x800 (just to see how much mAP it can get in the best conditions) and it got 48% accuracy vs 55% for yolov3 480x480. It's pretty good, given that it's very fast, even at 800x800 (26Bflops vs ~86 for yolov3 480x480). Is there any model that's in between yolov3 and yolo-tiny, with only a slight-ish drop in performance from yolov3 but significantly faster?

BernoGreyling commented 4 years ago

@mmartin56 ,

You can play around with the models in this table : https://github.com/AlexeyAB/darknet/issues/3114#issuecomment-494148968

As mentioned above, tiny_pan3 works very well.

mmartin56 commented 4 years ago

Hi @BernoGreyling ,

thanks for your help. The reason why I haven't tried pan3 yet is that, in my understanding, all approaches in #3114 require a video as training set. Is that right? (I am slightly confused with the difference between pan and lstm)

If that's right, is there any way you can train pan3 with a classic dataset (individual images)?

BernoGreyling commented 4 years ago

Hi @mmartin56 ,

You only need to be careful with the lstm models. They require the sequential video frames. The pan models you can go ahead as normal.

mmartin56 commented 4 years ago

Hi @BernoGreyling ,

I've tried training on my custom dataset (4 classes, large images). I use yolo_v3_tiny_pan3_aa_ae_mixup_scale_giou.cfg with sgdr:

batch=64
subdivisions=8
width=704
height=704

learning_rate=0.001
burn_in=1000
max_batches = 10000

policy=sgdr
sgdr_cycle=1000
sgdr_mult=2
steps=4000,6000,8000,9000

and I get the following graph Selection_364

which is a disappointing 37% (I get 48% with yolo-tiny and 54% with yolov3). Perhaps I need to keep training?

I've tried to change sgdr to steps:

learning_rate=0.001
burn_in=1000
max_batches=40000

policy=steps
steps=22000,54000
scales=.1,.1

Selection_365

So that's better (44%) but I'm still wondering if I'm doing something wrong. Any idea?

Thanks :)

mmartin56 commented 4 years ago

Side note: I also have large objects in my dataset. I notice small objects do pretty well, but not large objects. Any idea how to get the detector better on those? Perhaps retrain anchors?

BernoGreyling commented 4 years ago

@mmartin56,

Unfortunately I don't have experience with both large and small objects. My main focus is small objects at this stage.

According to the wiki, tiny_3l looks in theory to be better when looking at small and large objects? I could be wrong.

Trying custom anchors is always an option of course. I would give that a try. Have a look at if the custom anchors are very different to the ones in the current config. If they are very different it might be worth is. Also check that the your dataset is more balanced with respect to large and small objects when calculating anchors. Use the -show flag when calculating to be able to see the anchors points and how the clustering chooses anchor points. If your large objects are too few, the clustering will sort of ignore them and tend give smaller values.

You can also try increasing the num parameter to 12 or so to increase anchors and have a better shot at increasing mAP.

These are just things I noticed and I do not have a strict theoretical backing. @AlexeyAB will have to help if with more detail.

gnefihs commented 4 years ago

what dyou mean your objects are of simple shapes? It'll be nice if you can show some sample images of what you're talking about.

In general, you can simply reduce the channels in the .cfg files. If your objects are "simple" and you do not have many classes, you can probably make do with half as many channels. I would recommend not touching the first 5 conv layers though so that you can use the pre-trained weights instead of training everything from scratch.

And lastly, you should definitely customize your anchors. They are in absolute pixel sizes, so as you scale up your input size, make sure to scale up your anchor box sizes too.

mmartin56 commented 4 years ago

@mmartin56,

Unfortunately I don't have experience with both large and small objects. My main focus is small objects at this stage.

According to the wiki, tiny_3l looks in theory to be better when looking at small and large objects? I could be wrong.

Trying custom anchors is always an option of course. I would give that a try. Have a look at if the custom anchors are very different to the ones in the current config. If they are very different it might be worth is. Also check that the your dataset is more balanced with respect to large and small objects when calculating anchors. Use the -show flag when calculating to be able to see the anchors points and how the clustering chooses anchor points. If your large objects are too few, the clustering will sort of ignore them and tend give smaller values.

You can also try increasing the num parameter to 12 or so to increase anchors and have a better shot at increasing mAP.

These are just things I noticed and I do not have a strict theoretical backing. @AlexeyAB will have to help if with more detail.

Ok I'll try to set up custom anchors.

what dyou mean your objects are of simple shapes? It'll be nice if you can show some sample images of what you're talking about.

In general, you can simply reduce the channels in the .cfg files. If your objects are "simple" and you do not have many classes, you can probably make do with half as many channels. I would recommend not touching the first 5 conv layers though so that you can use the pre-trained weights instead of training everything from scratch.

And lastly, you should definitely customize your anchors. They are in absolute pixel sizes, so as you scale up your input size, make sure to scale up your anchor box sizes too.

I've been playing with traffic cones, which have regular, simple shapes and fairly bright colours. They are usually pretty small (20-30 pixels). There are also pedestrians and some types of vehicles in my custom dataset - they are both bigger and have more complex shapes than cones. I have 4 classes in total.

Thanks for your suggestions of reducing the number of channels but keeping the first ones. I'll definitely try customizing the anchors too. Cheers :)

bagminer commented 4 years ago

I am also trying to train yolo_v3_tiny_pan3.cfg model for small custom objects. Can you suggest the initial weights to be used for the this model. Can I use yolov3-tiny.conv.11 ?

AlexeyAB commented 4 years ago

@bagminer Yes

AlexeyAB / darknet

Network for small objects #4764