AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.66k stars 7.96k forks source link

Low mAP with large dataset #2593

Open berserker opened 5 years ago

berserker commented 5 years ago

I'm training a network with only 1 class with a large dataset but I'm getting very lows mAP.

This is my actual config:

Now the problem: after ~250000 iterations I'm only getting ~27% of mAP as you can see in the following chart chart

Questions:

  1. Each train/validation image is 1024x1024px and in the detection phase we will have the same input resolution: could the detection be affected by the network rescaling to 416x416? As far as I can see the details are still visible at that resolution, here it is a rescaled sample: 2_rescaled
  2. Is it a good idea in this scenario to enable random=1 (actually I'm using it since the configuration is the "plain" yolov3).
  3. Is yolov3.cfg the best option in my case? Do you suggest another configuration that fits best in this case (i.e.: yolov3_5l.cfg, yolov3-spp.cfg)?
  4. Do you suggest to increase the network resolution for better precision/mAP increase? To what value? Please note that I need a good frame rate in the detection phase with a medium/range hardware (i.e.: nvidia 2060ti). 1024x1024 resolution would have a very bad fps...is "tiny" version at 1024x1024 more suitable in this case?
  5. I'm getting lots of nans while training, is this related to the fact that regions width are very small (generally 1px)? I suspect that rescaling to 416x416 is the problem in this case right? Here it is the train's sample output: nans

Thanks for your help!

AlexeyAB commented 5 years ago

@berserker Hi,

  1. If your objects have width=1 pixel on 1024x1024 image, then you should train the yolov3-tiny.cfg model with width=1024 height=1024. (otherwise your objects will be removed/smoothed during resizing to 416x416.)

  2. If all your Training/Validation/Test images have the same size, then you can train with random=0

  3. You should use width=1024 height=1024 so may be yolov3-tiny.cfg or yolov3-tiny-3l.cfg

  4. Yes, you must increase network resolution to 1024x1024. I also suggest you to use default anchors, but set first value of each anchor to the 1 anchors = 1,14, 1,27, 1,58, 1,82, 1,169, 1,319 instead of https://github.com/AlexeyAB/darknet/blob/7a854302efb7adba80d5e8a747ad5e5ec384a226/cfg/yolov3-tiny.cfg#L134

  5. If you get nan not in the loss, then don't pay attention to it.

Also if you have to high speed with yolov3-tiny.cfg width=1024 height=1024, then you can sell you speed to the accuracy, just for example use 6 layers:

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

instead of these 2 layers: https://github.com/AlexeyAB/darknet/blob/7a854302efb7adba80d5e8a747ad5e5ec384a226/cfg/yolov3-tiny.cfg#L107-L121

berserker commented 5 years ago

Many thanks @AlexeyAB for your support!

Please help me in clarifing some more doubts:

1. If your objects have width=1 pixel on 1024x1024 image, then you should train the `yolov3-tiny.cfg` model with `width=1024 height=1024`. (otherwise your objects will be removed/smoothed during resizing to 416x416.)

Can you point me out to a doc/post that could help me to understand more deeply the pro/cons of yolov3.cfg versur yolov3-tiny.cfg (considering the same resolution 1024x1024)? I'm interested in particular to a comparisong of detection confidence, accuracy, performances, etc... Important: by now we have a network with only one class but in the future we plan to extend the model with 2 or 3 more classes that should have more "traditional" dimensions (i.e. 30x30, 40x50, etc...). What kind of approach do you suggest in this case considering that we must always support this first class (with 1 to 3 pixels width)? I see 3 options for the future plan, please correct me if I'm wrong:

3. You should use `width=1024 height=1024` so may be yolov3-tiny.cfg or yolov3-tiny-3l.cfg

What's the difference of yolov3-tiny.cfg over yolov3-tiny-3l.cfg?

4. Yes, you must increase network resolution to 1024x1024. I also suggest you to use default anchors, but set first value of each anchor to the 1
   `anchors = 1,14,  1,27,  1,58,  1,82,  1,169,  1,319`

Thanks, I'll try it for the first "release" with 1 class only support. I have the same doubts for the future release with the support of 2/3 more classes anyway: how does this suggestion fit? Should I go back to the default anchors then?

Also if you have to high speed with yolov3-tiny.cfg width=1024 height=1024, then you can sell you speed to the accuracy, just for example use 6 layers:

Is this a sort of 2x yolov3-tiny-3l.cfg implementation? I really need to understand more deeply this sry...

AlexeyAB commented 5 years ago

@berserker

I have the same doubts for the future release with the support of 2/3 more classes anyway: how does this suggestion fit? Should I go back to the default anchors then?

If you want to train model for 2/3 more classes, then you should recalculate anchors: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

And you must train your model from the begining for all classes: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

I think it is a better to use one model yolov3-tiny.cfg 1024x1024 for all classes.


Can you point me out to a doc/post that could help me to understand more deeply the pro/cons of yolov3.cfg versur yolov3-tiny.cfg (considering the same resolution 1024x1024)? I'm interested in particular to a comparisong of detection confidence, accuracy, performances, etc...

Is this a sort of 2x yolov3-tiny-3l.cfg implementation? I really need to understand more deeply this sry...

What do you mean? You must use 1024x1024 network resolution in any case, if your objects have size 1xN on 1024x1024 images. Otherwise, if you will use 416x416 network resolution, then your network will not see small or thin objecs like your lines.


What's the difference of yolov3-tiny.cfg over yolov3-tiny-3l.cfg?

yolov3-tiny-3l.cfg just have 3 yolo-layers instead of 2 yolo-layers in yolov3-tiny.cfg. It allows yolo to detect smaller objects.

berserker commented 5 years ago

I think it is a better to use one model yolov3-tiny.cfg 1024x1024 for all classes.

Thanks, I'll have a try with that!.

Can you point me out to a doc/post that could help me to understand more deeply the pro/cons of yolov3.cfg versur yolov3-tiny.cfg (considering the same resolution 1024x1024)? I'm interested in particular to a comparisong of detection confidence, accuracy, performances, etc...

Is this a sort of 2x yolov3-tiny-3l.cfg implementation? I really need to understand more deeply this sry...

What do you mean? You must use 1024x1024 network resolution in any case, if your objects have size 1xN on 1024x1024 images. Otherwise, if you will use 416x416 network resolution, then your network will not see small or thin objecs like your lines.

I mean in particular detailed differences of each configuration (default, tiny, spp, etc...) in terms of confidence, accuracy, performance and so on. The request wasn't related to my specific case, it was only a general advice to pickup the best configuration for a given task.

What's the difference of yolov3-tiny.cfg over yolov3-tiny-3l.cfg?

yolov3-tiny-3l.cfg just have 3 yolo-layers instead of 2 yolo-layers in yolov3-tiny.cfg. It allows yolo to detect smaller objects.

Thanks, so I think that yolov3-tiny-3l.cfg is more suitable in my case because 1px width images right?

AlexeyAB commented 5 years ago

Thanks, so I think that yolov3-tiny-3l.cfg is more suitable in my case because 1px width images right?

Your objects have size 1x14 - 1x319, so 1x14 is a small object, but 1x319 is small & big object.

May be yolov3-tiny-3l.cfg will work better for 1x14 objects than yolov3-tiny and have the same accuracy for 1x319.

So try to use yolov3-tiny-3l.cfg width=1024 height=1024

berserker commented 5 years ago

Your objects have size 1x14 - 1x319, so 1x14 is a small object, but 1x319 is small & big object.

May be yolov3-tiny-3l.cfg will work better for 1x14 objects than yolov3-tiny and have the same accuracy for 1x319.

So try to use yolov3-tiny-3l.cfg width=1024 height=1024

Thanks again @AlexeyAB for your support, you were really very kind!