Ideal dataset size for yolov3?

lbrito55 commented 5 years ago

Hey @AlexeyAB

Ive been training a model to detect images from cctv cameras, using yolov3-spp.cfg.

At the moment i have around 10k images and 6 classes and im running a k-fold like training setup, and not the original 80/20 split.. because of my dataset size.

i've achieved ~70%mAP with this current setup

Any thoughts on how can i improve?

Thanks in advance

LB

AlexeyAB commented 5 years ago

@lbrito55 Hi,

Try to use default anchors and use more suitable network size

iraadit commented 5 years ago

Hi @lbrito55 ,

How did you produce those graphs?

Thanks.

AlexeyAB commented 5 years ago

What network size do you use? Also can you show clound of points? ./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 -show

lbrito55 commented 5 years ago

@iraadit im using tensorboard!

@AlexeyAB anchors = 16, 44, 37, 49, 25,100, 98, 76, 126,135, 154,205, 175,263, 177,330, 179,399;

yolo-spp.cfg default anchors i think, do you mean a cluster like graph?,

Im around 12k epochs for 6 classes and recently, it reached a new best maP. The loss curve its still fluctuating, how long should i keep training it without overfitting?

Thanks,

lbrito55 commented 5 years ago

@AlexeyAB

AlexeyAB commented 5 years ago

Do you use default network size width=608 height=608 in cfg-file?

Also show calculated anchors: ./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416

lbrito55 commented 5 years ago

@AlexeyAB yes im using 608x608

anchors = 16, 44, 37, 49, 25,100, 98, 76, 126,135, 154,205, 175,263, 177,330, 179,399

AlexeyAB commented 5 years ago

Im around 12k epochs for 6 classes and recently, it reached a new best maP. The loss curve its still fluctuating, how long should i keep training it without overfitting?

While mAp on Validation dataset still increases.

Can you show examples of images with your objects?

You can try to train this model, just change width=640 height=640 (may be increase subdivisions), change max_batches=, classes= and filters= filters = (classes + 8 + 1) * <numbers in mask> yolo_v3_tiny_pan5_matrix_gaussian_GIoU_aa_ae_mixup.cfg.txt

And train at least 7000 iterations.

lbrito55 commented 5 years ago

@AlexeyAB do you have an specific version for yolov3 that i can use without using a tiny yolo?

You can try to train this model, just change width=608 height=608 (may be increase subdivisions), change max_batches=, classes= and filters= filters = (classes + 8 + 1) * yolo_v3_tiny_pan5_matrix_gaussian_GIoU_aa_ae_mixup.cfg.txt

AlexeyAB commented 5 years ago

What does it mean "without tiny" ?

lbrito55 commented 5 years ago

@AlexeyAB Nevermind, lemme process with the new cfg and ill post the output

AlexeyAB commented 5 years ago

Just check the mAP after 7000 iterations

lbrito55 commented 5 years ago

@AlexeyAB im getting the following error when using the latest commit of the repo and using this cfg

yolo_v3_tiny_pan5_matrix_gaussian_GIoU_aa_ae_mixup.cfg (2).txt

for 6 classes

compute_capability = 610, cudnn_half = 0 batch = 1, time_steps = 1, train = 1 layer filters size/strd(dil) input output 0 conv 24 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 24 0.479 BF 1 max 2x 2/ 2 608 x 608 x 24 -> 304 x 304 x 24 0.009 BF 2 conv 8 5 x 5/ 1 304 x 304 x 24 -> 304 x 304 x 8 0.887 BF 3 route 0 -> 608 x 608 x 24 4 max 2x 2/ 2x 1 608 x 608 x 24 -> 304 x 608 x 24 0.018 BF 5 convS 8 5x 5/ 1x 2 304 x 608 x 24 -> 304 x 304 x 8 0.887 BF 6 route 0 -> 608 x 608 x 24 7 max 2x 2/ 1x 2 608 x 608 x 24 -> 608 x 304 x 24 0.018 BF 8 convS 8 5x 5/ 2x 1 608 x 304 x 24 -> 304 x 304 x 8 0.887 BF 9 route 0 -> 608 x 608 x 24 10 convS 8 5 x 5/ 1 608 x 608 x 24 -> 608 x 608 x 8 3.549 BF 11 reorg / 2 608 x 608 x 8 -> 304 x 304 x 32 12 route 11 8 5 2 -> 304 x 304 x 56 13 conv 32 1 x 1/ 1 304 x 304 x 56 -> 304 x 304 x 32 0.331 BF 14 conv 32 3 x 3/ 1 304 x 304 x 32 -> 304 x 304 x 32 1.703 BF 15 max 2x 2/ 2 304 x 304 x 32 -> 152 x 152 x 32 0.003 BF 16 conv 16 5 x 5/ 1 152 x 152 x 32 -> 152 x 152 x 16 0.591 BF 17 route 14 -> 304 x 304 x 32 18 max 2x 2/ 2x 1 304 x 304 x 32 -> 152 x 304 x 32 0.006 BF 19 convS 16 5x 5/ 1x 2 152 x 304 x 32 -> 152 x 152 x 16 0.591 BF 20 route 14 -> 304 x 304 x 32 21 max 2x 2/ 1x 2 304 x 304 x 32 -> 304 x 152 x 32 0.006 BF 22 convS 16 5x 5/ 2x 1 304 x 152 x 32 -> 152 x 152 x 16 0.591 BF 23 route 14 -> 304 x 304 x 32 24 convS 16 5 x 5/ 1 304 x 304 x 32 -> 304 x 304 x 16 2.366 BF 25 reorg / 2 304 x 304 x 16 -> 152 x 152 x 64 26 route 25 22 19 16 -> 152 x 152 x 112 27 conv 32 1 x 1/ 1 152 x 152 x 112 -> 152 x 152 x 32 0.166 BF 28 conv 64 3 x 3/ 1 152 x 152 x 32 -> 152 x 152 x 64 0.852 BF 29 max 2x 2/ 2 152 x 152 x 64 -> 76 x 76 x 64 0.001 BF 30 conv 32 5 x 5/ 1 76 x 76 x 64 -> 76 x 76 x 32 0.591 BF 31 route 28 -> 152 x 152 x 64 32 max 2x 2/ 2x 1 152 x 152 x 64 -> 76 x 152 x 64 0.003 BF 33 convS 32 5x 5/ 1x 2 76 x 152 x 64 -> 76 x 76 x 32 0.591 BF 34 route 28 -> 152 x 152 x 64 35 max 2x 2/ 1x 2 152 x 152 x 64 -> 152 x 76 x 64 0.003 BF 36 convS 32 5x 5/ 2x 1 152 x 76 x 64 -> 76 x 76 x 32 0.591 BF 37 route 28 -> 152 x 152 x 64 38 convS 32 5 x 5/ 1 152 x 152 x 64 -> 152 x 152 x 32 2.366 BF 39 reorg / 2 152 x 152 x 32 -> 76 x 76 x 128 40 route 39 36 33 30 -> 76 x 76 x 224 41 conv 64 1 x 1/ 1 76 x 76 x 224 -> 76 x 76 x 64 0.166 BF 42 conv 128 3 x 3/ 1 76 x 76 x 64 -> 76 x 76 x 128 0.852 BF 43 max 2x 2/ 2 76 x 76 x 128 -> 38 x 38 x 128 0.001 BF 44 conv 64 5 x 5/ 1 38 x 38 x 128 -> 38 x 38 x 64 0.591 BF 45 route 42 -> 76 x 76 x 128 46 max 2x 2/ 2x 1 76 x 76 x 128 -> 38 x 76 x 128 0.001 BF 47 convS 64 5x 5/ 1x 2 38 x 76 x 128 -> 38 x 38 x 64 0.591 BF 48 route 42 -> 76 x 76 x 128 49 max 2x 2/ 1x 2 76 x 76 x 128 -> 76 x 38 x 128 0.001 BF 50 convS 64 5x 5/ 2x 1 76 x 38 x 128 -> 38 x 38 x 64 0.591 BF 51 route 42 -> 76 x 76 x 128 52 convS 64 5 x 5/ 1 76 x 76 x 128 -> 76 x 76 x 64 2.366 BF 53 reorg / 2 76 x 76 x 64 -> 38 x 38 x 256 54 route 53 50 47 44 -> 38 x 38 x 448 55 conv 128 1 x 1/ 1 38 x 38 x 448 -> 38 x 38 x 128 0.166 BF 56 conv 256 3 x 3/ 1 38 x 38 x 128 -> 38 x 38 x 256 0.852 BF 57 max 2x 2/ 2 38 x 38 x 256 -> 19 x 19 x 256 0.000 BF 58 conv 128 5 x 5/ 1 19 x 19 x 256 -> 19 x 19 x 128 0.591 BF 59 route 56 -> 38 x 38 x 256 60 max 2x 2/ 2x 1 38 x 38 x 256 -> 19 x 38 x 256 0.001 BF 61 convS 128 5x 5/ 1x 2 19 x 38 x 256 -> 19 x 19 x 128 0.591 BF 62 route 56 -> 38 x 38 x 256 63 max 2x 2/ 1x 2 38 x 38 x 256 -> 38 x 19 x 256 0.001 BF 64 convS 128 5x 5/ 2x 1 38 x 19 x 256 -> 19 x 19 x 128 0.591 BF 65 route 56 -> 38 x 38 x 256 66 convS 128 5 x 5/ 1 38 x 38 x 256 -> 38 x 38 x 128 2.366 BF 67 reorg / 2 38 x 38 x 128 -> 19 x 19 x 512 68 route 67 64 61 58 -> 19 x 19 x 896 69 conv 256 1 x 1/ 1 19 x 19 x 896 -> 19 x 19 x 256 0.166 BF 70 conv 512 3 x 3/ 1 19 x 19 x 256 -> 19 x 19 x 512 0.852 BF 71 conv 256 1 x 1/ 1 19 x 19 x 512 -> 19 x 19 x 256 0.095 BF 72 max 5x 5/ 1 19 x 19 x 256 -> 19 x 19 x 256 0.002 BF 73 route 71 -> 19 x 19 x 256 74 max 9x 9/ 1 19 x 19 x 256 -> 19 x 19 x 256 0.007 BF 75 route 71 -> 19 x 19 x 256 76 max 13x13/ 1 19 x 19 x 256 -> 19 x 19 x 256 0.016 BF 77 route 76 74 72 71 -> 19 x 19 x1024 78 conv 256 1 x 1/ 1 19 x 19 x1024 -> 19 x 19 x 256 0.189 BF 79 Shortcut Layer: 69 80 conv 1024 3 x 3/ 1 19 x 19 x 256 -> 19 x 19 x1024 1.703 BF 81 conv 256 1 x 1/ 1 19 x 19 x1024 -> 19 x 19 x 256 0.189 BF 82 conv 512 3 x 3/ 1 19 x 19 x 256 -> 19 x 19 x 512 0.852 BF 83 max 2x 2/ 2 19 x 19 x 512 -> 10 x 10 x 512 0.000 BF 84 conv 128 5 x 5/ 1 10 x 10 x 512 -> 10 x 10 x 128 0.328 BF 85 route 82 -> 19 x 19 x 512 86 max 2x 2/ 2x 1 19 x 19 x 512 -> 10 x 19 x 512 0.000 BF 87 convS 128 5x 5/ 1x 2 10 x 19 x 512 -> 10 x 10 x 128 0.328 BF 88 route 82 -> 19 x 19 x 512 89 max 2x 2/ 1x 2 19 x 19 x 512 -> 19 x 10 x 512 0.000 BF 90 convS 128 5x 5/ 2x 1 19 x 10 x 512 -> 10 x 10 x 128 0.328 BF 91 route 82 -> 19 x 19 x 512 92 convS 128 5 x 5/ 1 19 x 19 x 512 -> 19 x 19 x 128 1.183 BF 93 reorg / 2 19 x 19 x 128 -> 9 x 9 x 512 94 route 93 90 87 84 -> 0 x 0 x 0 95 Layer before convolutional layer must output image.: File exists darknet: ./src/utils.c:295: error: Assertion `0' failed. Aborted (core dumped)

Any ideas?

AlexeyAB commented 5 years ago

@lbrito55 Oh yeah, set 640x640 or 576x576 or 512x512 in cfg-file.

lbrito55 commented 5 years ago

A4B6E1CF-C650-487A-BC5B-B379D70730B8

@AlexeyAB using this configuration I’m around 2k epochs and its fluctuating around 150 and 70 the loss value... is this normal?

How this new .cfg file is better? I’m just curious

Thanks

AlexeyAB commented 5 years ago

What mAP do you get with this cfg-file?

lbrito55 commented 5 years ago

It lasted the whole night training, and tried to sort 30k boxes in this section of the code [Max suppression algorithm] if (nms) do_nms...

For a dataset with 2k images.. So couldn’t get the map done, am i doing something wrong?

lbrito55 commented 5 years ago

It lasted the whole night training, and tried to sort 30k boxes in this section of the code [Max suppression algorithm] if (nms) do_nms...

For a dataset with 2k images.. So couldn’t get the map done, am i doing something wrong?

AlexeyAB commented 5 years ago

Do you mean that command ./darknet detector map .. took whole night for 2000 images in your validation dataset?

lbrito55 commented 5 years ago

Yes!

AlexeyAB commented 5 years ago

It is strange. Can you share your cfg-file?
Can you detect any object by using your trained model?

lbrito55 commented 5 years ago

Here's the cfg-file I'm using: mycfg.cfg.txt The model doesn't detect anything meaningful so far. I keep training, and it seems the loss is increasing

AlexeyAB commented 5 years ago

What command do you use for training?
What command do you use for detection?

You should be able to detect anything after 1000 iterations and get normal accuracy after 4000 iteration. Loss doesn't matter.

So you can try to train this model: yolo_v3_tiny_pan5_matrix_gaussian_GIoU_aa_ae_mixup.cfg.txt

or this one: https://github.com/AlexeyAB/darknet/files/3580764/yolo_v3_tiny_pan3_aa_ae_mixup_scale_giou.cfg.txt

lbrito55 commented 5 years ago

I was using weights from the ~500 iteration somehow. I tried again with weights from +2000 iterations and everything seems to work just fine at the moment. I'll leave it training more iterations to compare results. Sorry for that inconvenience, I'll let you know if anything. Thanks.

AlexeyAB commented 5 years ago

I was using weights from the ~500 iteration somehow.

https://github.com/AlexeyAB/darknet/issues/4325#issuecomment-556017979

And train at least 7000 iterations.

https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total.

haicheviet commented 5 years ago

@lbrito55 Can i ask how can you use tensorboard in darknet repo. I trained in AWS and the web loss is not given a lot of information. Tks in advance

AlexeyAB / darknet

Ideal dataset size for yolov3? #4325