I can not decrease error rate in my training.

sounansu commented 4 years ago

Hi @AlexeyAB ! Long time no see.

Now, I am trying your Yolo V4. But, my training log are not good.

I show front of log and middle of log.

 CUDA-version: 10000 (10020), cuDNN: 7.6.5, GPU count: 1
 OpenCV version: 4.1.1
 compute_capability = 610, cudnn_half = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1                                      ->  304 x 304 x  64
   4 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   5 conv     32       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  32 0.379 BF
   6 conv     64       3 x 3/ 1    304 x 304 x  32 ->  304 x 304 x  64 3.407 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 304 x 304 x  64 0.006 BF
   8 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   9 route  8 2                                    ->  304 x 304 x 128
  10 conv     64       1 x 1/ 1    304 x 304 x 128 ->  304 x 304 x  64 1.514 BF
  11 conv    128       3 x 3/ 2    304 x 304 x  64 ->  152 x 152 x 128 3.407 BF
  12 conv     64       1 x 1/ 1    152 x 152 x 128 ->  152 x 152 x  64 0.379 BF
  13 route  11                                     ->  152 x 152 x 128
  14 conv     64       1 x 1/ 1    152 x 152 x 128 ->  152 x 152 x  64 0.379 BF
  15 conv     64       1 x 1/ 1    152 x 152 x  64 ->  152 x 152 x  64 0.189 BF
  16 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
  17 Shortcut Layer: 14,  wt = 0, wn = 0, outputs: 152 x 152 x  64 0.001 BF
  18 conv     64       1 x 1/ 1    152 x 152 x  64 ->  152 x 152 x  64 0.189 BF
  19 conv     64       3 x 3/ 1    152 x 152 x  64 ->  152 x 152 x  64 1.703 BF
  20 Shortcut Layer: 17,  wt = 0, wn = 0, outputs: 152 x 152 x  64 0.001 BF
  21 conv     64       1 x 1/ 1    152 x 152 x  64 ->  152 x 152 x  64 0.189 BF
  22 route  21 12                                  ->  152 x 152 x 128
  23 conv    128       1 x 1/ 1    152 x 152 x 128 ->  152 x 152 x 128 0.757 BF
  24 conv    256       3 x 3/ 2    152 x 152 x 128 ->   76 x  76 x 256 3.407 BF
  25 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
  26 route  24                                     ->   76 x  76 x 256
  27 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
  28 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  29 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  30 Shortcut Layer: 27,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  31 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  32 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  33 Shortcut Layer: 30,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  34 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  35 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  36 Shortcut Layer: 33,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  37 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  38 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  39 Shortcut Layer: 36,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  40 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  41 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  42 Shortcut Layer: 39,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  43 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  44 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  45 Shortcut Layer: 42,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  46 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  47 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  48 Shortcut Layer: 45,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  49 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  50 conv    128       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 128 1.703 BF
  51 Shortcut Layer: 48,  wt = 0, wn = 0, outputs:  76 x  76 x 128 0.001 BF
  52 conv    128       1 x 1/ 1     76 x  76 x 128 ->   76 x  76 x 128 0.189 BF
  53 route  52 25                                  ->   76 x  76 x 256
  54 conv    256       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 256 0.757 BF
  55 conv    512       3 x 3/ 2     76 x  76 x 256 ->   38 x  38 x 512 3.407 BF
  56 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
  57 route  55                                     ->   38 x  38 x 512
  58 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
  59 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  60 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  61 Shortcut Layer: 58,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  62 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  63 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  64 Shortcut Layer: 61,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  65 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  66 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  67 Shortcut Layer: 64,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  68 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  69 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  70 Shortcut Layer: 67,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  71 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  72 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  73 Shortcut Layer: 70,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  74 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  75 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  76 Shortcut Layer: 73,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  77 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  78 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  79 Shortcut Layer: 76,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  80 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  81 conv    256       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 256 1.703 BF
  82 Shortcut Layer: 79,  wt = 0, wn = 0, outputs:  38 x  38 x 256 0.000 BF
  83 conv    256       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 256 0.189 BF
  84 route  83 56                                  ->   38 x  38 x 512
  85 conv    512       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 512 0.757 BF
  86 conv   1024       3 x 3/ 2     38 x  38 x 512 ->   19 x  19 x1024 3.407 BF
  87 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
  88 route  86                                     ->   19 x  19 x1024
  89 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
  90 conv    512       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.189 BF
  91 conv    512       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x 512 1.703 BF
  92 Shortcut Layer: 89,  wt = 0, wn = 0, outputs:  19 x  19 x 512 0.000 BF
  93 conv    512       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.189 BF
  94 conv    512       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x 512 1.703 BF
  95 Shortcut Layer: 92,  wt = 0, wn = 0, outputs:  19 x  19 x 512 0.000 BF
  96 conv    512       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.189 BF
  97 conv    512       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x 512 1.703 BF
  98 Shortcut Layer: 95,  wt = 0, wn = 0, outputs:  19 x  19 x 512 0.000 BF
  99 conv    512       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.189 BF
 100 conv    512       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x 512 1.703 BF
 101 Shortcut Layer: 98,  wt = 0, wn = 0, outputs:  19 x  19 x 512 0.000 BF
 102 conv    512       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.189 BF
 103 route  102 87                                 ->   19 x  19 x1024
 104 conv   1024       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x1024 0.757 BF
 105 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 106 conv   1024       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x1024 3.407 BF
 107 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 108 max                5x 5/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.005 BF
 109 route  107                                            ->   19 x  19 x 512
 110 max                9x 9/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.015 BF
 111 route  107                                            ->   19 x  19 x 512
 112 max               13x13/ 1     19 x  19 x 512 ->   19 x  19 x 512 0.031 BF
 113 route  112 110 108 107                        ->   19 x  19 x2048
 114 conv    512       1 x 1/ 1     19 x  19 x2048 ->   19 x  19 x 512 0.757 BF
 115 conv   1024       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x1024 3.407 BF
 116 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 117 conv    256       1 x 1/ 1     19 x  19 x 512 ->   19 x  19 x 256 0.095 BF
 118 upsample                 2x    19 x  19 x 256 ->   38 x  38 x 256
 119 route  85                                     ->   38 x  38 x 512
 120 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 121 route  120 118                                ->   38 x  38 x 512
 122 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 123 conv    512       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 512 3.407 BF
 124 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 125 conv    512       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 512 3.407 BF
 126 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 127 conv    128       1 x 1/ 1     38 x  38 x 256 ->   38 x  38 x 128 0.095 BF
 128 upsample                 2x    38 x  38 x 128 ->   76 x  76 x 128
 129 route  54                                     ->   76 x  76 x 256
 130 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
 131 route  130 128                                ->   76 x  76 x 256
 132 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
 133 conv    256       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 256 3.407 BF
 134 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
 135 conv    256       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 256 3.407 BF
 136 conv    128       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 128 0.379 BF
 137 conv    256       3 x 3/ 1     76 x  76 x 128 ->   76 x  76 x 256 3.407 BF
 138 conv    255       1 x 1/ 1     76 x  76 x 256 ->   76 x  76 x 255 0.754 BF
 139 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.20
 140 route  136                                            ->   76 x  76 x 128
 141 conv    256       3 x 3/ 2     76 x  76 x 128 ->   38 x  38 x 256 0.852 BF
 142 route  141 126                                ->   38 x  38 x 512
 143 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 144 conv    512       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 512 3.407 BF
 145 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 146 conv    512       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 512 3.407 BF
 147 conv    256       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 256 0.379 BF
 148 conv    512       3 x 3/ 1     38 x  38 x 256 ->   38 x  38 x 512 3.407 BF
 149 conv    255       1 x 1/ 1     38 x  38 x 512 ->   38 x  38 x 255 0.377 BF
 150 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.10
 151 route  147                                            ->   38 x  38 x 256
 152 conv    512       3 x 3/ 2     38 x  38 x 256 ->   19 x  19 x 512 0.852 BF
 153 route  152 116                                ->   19 x  19 x1024
 154 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 155 conv   1024       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x1024 3.407 BF
 156 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 157 conv   1024       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x1024 3.407 BF
 158 conv    512       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 512 0.379 BF
 159 conv   1024       3 x 3/ 1     19 x  19 x 512 ->   19 x  19 x1024 3.407 BF
 160 conv    255       1 x 1/ 1     19 x  19 x1024 ->   19 x  19 x 255 0.189 BF
 161 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
Total BFLOPS 128.459
avg_outputs = 1068395
 Allocate additional workspace_size = 52.43 MB
Loading weights from csdarknet53-omega.conv.105...yolov4
net.optimized_memory = 0
mini_batch = 2, batch = 64, time_steps = 1, train = 1
nms_kind: greedynms (1), beta = 0.600000
nms_kind: greedynms (1), beta = 0.600000
nms_kind: greedynms (1), beta = 0.600000
Done! Loaded 105 layers from weights-file
 Create 6 permanent cpu-threads

I am training just now.

$ grep avg train.log |tail
 13260: 20.189791, 22.036131 avg loss, 0.002610 rate, 9.289521 seconds, 848640 images, 1254.633677 hours left
 13261: 25.243334, 22.356852 avg loss, 0.002610 rate, 7.437495 seconds, 848704 images, 1254.660252 hours left
 13262: 19.825205, 22.103687 avg loss, 0.002610 rate, 7.183797 seconds, 848768 images, 1252.439819 hours left
 13263: 26.585987, 22.551918 avg loss, 0.002610 rate, 7.518157 seconds, 848832 images, 1249.638292 hours left
 13264: 23.335985, 22.630325 avg loss, 0.002610 rate, 7.374341 seconds, 848896 images, 1247.317296 hours left
 13265: 20.908176, 22.458111 avg loss, 0.002610 rate, 7.250159 seconds, 848960 images, 1244.824864 hours left
 13266: 20.753019, 22.287601 avg loss, 0.002610 rate, 7.203633 seconds, 849024 images, 1242.189244 hours left
 13267: 21.502653, 22.209106 avg loss, 0.002610 rate, 7.286604 seconds, 849088 images, 1239.517007 hours left
 13268: 21.816250, 22.169821 avg loss, 0.002610 rate, 7.314934 seconds, 849152 images, 1236.983753 hours left
 13269: 21.043957, 22.057234 avg loss, 0.002610 rate, 7.291340 seconds, 849216 images, 1234.514153 hours left

22.057234 avg loss is not good. I think so. I start with this command.

 ./darknet detector train cfg/coco.data cfg/yolov4.cfg csdarknet53-omega.conv.105 -dont_show &>train.log

And I make darknet with this Makefile.

$ git diff Makefile
diff --git a/Makefile b/Makefile
index d4dd53f..26c74f6 100644
--- a/Makefile
+++ b/Makefile
@@ -1,8 +1,8 @@
-GPU=0
-CUDNN=0
+GPU=1
+CUDNN=1
 CUDNN_HALF=0
-OPENCV=0
-AVX=0
+OPENCV=1
+AVX=1
 OPENMP=0
 LIBSO=0
 ZED_CAMERA=0 # ZED SDK 3.0 and above

Please suggest to my training !

Thank you!

AlexeyAB commented 4 years ago

@sounansu Hi,

What dataset do you use?

sounansu commented 4 years ago

@AlexeyAB Sorry, I forgot.

I trace your Training and Evaluation of speed and accuracy on MS COCO training environment.

So, I already had MSCOCO dataset, but I get MSCOCO dataset with YourScript again.

I modify your coco.data as below.

$ cat cfg/coco.data
classes= 80
train  = /image_data/coco/trainvalno5k.txt
valid  = /image_data/coco/testdev2017.txt
#valid = data/coco_val_5k.list
names = data/coco.names
backup = ./backup
eval=coco

AlexeyAB commented 4 years ago

avg loss for datasets like MS COCO can be very high. What GPU do you use? Do you use batch=64 subdivisions=32 ?

sounansu commented 4 years ago

Hi @AlexeyAB ! I show my yolov4.cfg.

$ head -15 anaconda3/darknet/cfg/yolov4.cfg
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=608
height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5

and nvidia-smi -L

$ nvidia-smi -L
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)

AlexeyAB commented 4 years ago

I trace your Training and Evaluation of speed and accuracy on MS COCO training environment.

https://github.com/AlexeyAB/darknet/wiki/Train-Detector-on-MS-COCO-(trainvalno5k-2014)-dataset

Required files: yolov4.cfg - https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg

(change width=512 height=512 in cfg-file)

with 608x608 batch=64 subdivisiosn=32 you should multiply 1,1875 x all anchors in all 3 [yolo] layers, training will be longer, but you will get higher AP
with 512x512 batch=64 subdivisiosn=8 training will be faster, and you shouldn't change anchors, you will get AP from the paper

sounansu commented 4 years ago

@AlexeyAB ! Thank you for your advice.

I retried with 512x512 batch=64 subdivision=32. With subdivision=8 caused out of memory!

I decreased avg loss until 6991 iterations. But at 6992 iteration time, avg loss show negative value. I show log below

$ grep avg train.log |egrep "^ 698|^ 699"
 698: 14.148299, 19.743196 avg loss, 0.000620 rate, 3.589678 seconds, 44672 images, 944.396128 hours left
 699: 22.595047, 20.028381 avg loss, 0.000623 rate, 3.776269 seconds, 44736 images, 939.935900 hours left
 6980: 12.387778, 15.608065 avg loss, 0.002610 rate, 6.991823 seconds, 446720 images, 941.920228 hours left
 6981: 15.596233, 15.606881 avg loss, 0.002610 rate, 8.542204 seconds, 446784 images, 942.086081 hours left
 6982: 18.623497, 15.908543 avg loss, 0.002610 rate, 8.704581 seconds, 446848 images, 944.375660 hours left
 6983: 17.010046, 16.018692 avg loss, 0.002610 rate, 8.752662 seconds, 446912 images, 946.864913 hours left
 6984: 13.561173, 15.772940 avg loss, 0.002610 rate, 8.409437 seconds, 446976 images, 949.395163 hours left
 6985: 18.364819, 16.032127 avg loss, 0.002610 rate, 8.654661 seconds, 447040 images, 951.429565 hours left
 6986: 16.681313, 16.097046 avg loss, 0.002610 rate, 8.626774 seconds, 447104 images, 953.779786 hours left
 6987: 13.151031, 15.802444 avg loss, 0.002610 rate, 8.386545 seconds, 447168 images, 956.068238 hours left
 6988: 18.078333, 16.030033 avg loss, 0.002610 rate, 8.676926 seconds, 447232 images, 958.004476 hours left
 6989: 10.161444, 15.443174 avg loss, 0.002610 rate, 8.228171 seconds, 447296 images, 960.319389 hours left
 6990: 15.453902, 15.444247 avg loss, 0.002610 rate, 8.526408 seconds, 447360 images, 961.995948 hours left
 6991: 20.952736, 15.995096 avg loss, 0.002610 rate, 5.238075 seconds, 447424 images, 964.064559 hours left
 6992: -3374.463867, -323.050812 avg loss, 0.002610 rate, 5.110243 seconds, 447488 images, 961.635880 hours left
 6993: 1348.962524, 1348.962524 avg loss, 0.002610 rate, 5.209830 seconds, 447552 images, 959.024992 hours left
 6994: 919.789978, 1306.045288 avg loss, 0.002610 rate, 5.249565 seconds, 447616 images, 956.576715 hours left
 6995: -2189.557617, 956.484985 avg loss, 0.002610 rate, 5.153039v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: -0.170633), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 9, class_loss = 8.999999, iou_loss = 0.000009, total_loss = 9.000008
 6996: 1344.219727, 995.258484 avg loss, 0.002610 rate, 5.148190 seconds, 447744 images, 951.729398 hours left
 6997: 3496.794434, 1245.412109 avg loss, 0.002610 rate, 5.301418 seconds, 447808 images, 949.269531 hours left
 6998: 44.709747, 1125.341919 avg loss, 0.002610 rate, 5.254768 seconds, 447872 images, 947.044303 hours left
 6999: -3274.613770, 685.346375 avg loss, 0.002610 rate, 5.075725 seconds, 447936 images, 944.777371 hours left

I am very surprised! What happen in my training!

AlexeyAB commented 4 years ago

I retried with 512x512 batch=64 subdivision=32. With subdivision=8 caused out of memory!

Tain with 512x512 batch=64 subdivision=16

Do you train by using only 1 GPU? What command do you use for training? Attach your cfg-file.

sounansu commented 4 years ago

@AlexeyAB ! I only use single GTX1080i

Tain with 512x512 batch=64 subdivision=16 Cause core dump,too.

$ head cfg/yolov4.cfg
[net]
batch=64
subdivisions=16
# Training
width=512
height=512
#width=608
#height=608
channels=3
momentum=0.949
$
$ ./darknet detector train cfg/coco.data cfg/yolov4.cfg csdarknet53-omega.conv.105 -dont_show
 CUDA-version: 10000 (10020), cuDNN: 7.6.5, GPU count: 1
 OpenCV version: 4.1.1
yolov4
 compute_capability = 610, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 4, batch = 64, time_steps = 1, train = 1
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    512 x 512 x   3 ->  512 x 512 x  32 0.453 BF
   1 conv     64       3 x 3/ 2    512 x 512 x  32 ->  256 x 256 x  64 2.416 BF
   2 conv     64       1 x 1/ 1    256 x 256 x  64 ->  256 x 256 x  64 0.537 BF
   3 route  1                                      ->  256 x 256 x  64
   4 conv     64       1 x 1/ 1    256 x 256 x  64 ->  256 x 256 x  64 0.537 BF
   5 conv     32       1 x 1/ 1    256 x 256 x  64 ->  256 x 256 x  32 0.268 BF
   6 conv     64       3 x 3/ 1    256 x 256 x  32 ->  256 x 256 x  64 2.416 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 256 x 256 x  64 0.004 BF
   8 conv     64       1 x 1/ 1    256 x 256 x  64 ->  256 x 256 x  64 0.537 BF
   9 route  8 2                                    ->  256 x 256 x 128
  10 conv     64       1 x 1/ 1    256 x 256 x 128 ->  256 x 256 x  64 1.074 BF
  11 conv    128       3 x 3/ 2    256 x 256 x  64 ->  128 x 128 x 128 2.416 BF
  12 conv     64       1 x 1/ 1    128 x 128 x 128 ->  128 x 128 x  64 0.268 BF
  13 route  11                                     ->  128 x 128 x 128
  14 conv     64       1 x 1/ 1    128 x 128 x 128 ->  128 x 128 x  64 0.268 BF
  15 conv     64       1 x 1/ 1    128 x 128 x  64 ->  128 x 128 x  64 0.134 BF
  16 conv     64       3 x 3/ 1    128 x 128 x  64 ->  128 x 128 x  64 1.208 BF
  17 Shortcut Layer: 14,  wt = 0, wn = 0, outputs: 128 x 128 x  64 0.001 BF
  18 conv     64       1 x 1/ 1    128 x 128 x  64 ->  128 x 128 x  64 0.134 BF
  19 conv     64       3 x 3/ 1    128 x 128 x  64 ->  128 x 128 x  64 1.208 BF
  20 Shortcut Layer: 17,  wt = 0, wn = 0, outputs: 128 x 128 x  64 0.001 BF
  21 conv     64       1 x 1/ 1    128 x 128 x  64 ->  128 x 128 x  64 0.134 BF
  22 route  21 12                                  ->  128 x 128 x 128
  23 conv    128       1 x 1/ 1    128 x 128 x 128 ->  128 x 128 x 128 0.537 BF
  24 conv    256       3 x 3/ 2    128 x 128 x 128 ->   64 x  64 x 256 2.416 BF
  25 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
  26 route  24                                     ->   64 x  64 x 256
  27 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
  28 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  29 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  30 Shortcut Layer: 27,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  31 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  32 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  33 Shortcut Layer: 30,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  34 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  35 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  36 Shortcut Layer: 33,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  37 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  38 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  39 Shortcut Layer: 36,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  40 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  41 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  42 Shortcut Layer: 39,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  43 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  44 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  45 Shortcut Layer: 42,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  46 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  47 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  48 Shortcut Layer: 45,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  49 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  50 conv    128       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 128 1.208 BF
  51 Shortcut Layer: 48,  wt = 0, wn = 0, outputs:  64 x  64 x 128 0.001 BF
  52 conv    128       1 x 1/ 1     64 x  64 x 128 ->   64 x  64 x 128 0.134 BF
  53 route  52 25                                  ->   64 x  64 x 256
  54 conv    256       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 256 0.537 BF
  55 conv    512       3 x 3/ 2     64 x  64 x 256 ->   32 x  32 x 512 2.416 BF
  56 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
  57 route  55                                     ->   32 x  32 x 512
  58 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
  59 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  60 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  61 Shortcut Layer: 58,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  62 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  63 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  64 Shortcut Layer: 61,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  65 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  66 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  67 Shortcut Layer: 64,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  68 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  69 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  70 Shortcut Layer: 67,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  71 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  72 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  73 Shortcut Layer: 70,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  74 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  75 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  76 Shortcut Layer: 73,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  77 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  78 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  79 Shortcut Layer: 76,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  80 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  81 conv    256       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 256 1.208 BF
  82 Shortcut Layer: 79,  wt = 0, wn = 0, outputs:  32 x  32 x 256 0.000 BF
  83 conv    256       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 256 0.134 BF
  84 route  83 56                                  ->   32 x  32 x 512
  85 conv    512       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 512 0.537 BF
  86 conv   1024       3 x 3/ 2     32 x  32 x 512 ->   16 x  16 x1024 2.416 BF
  87 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
  88 route  86                                     ->   16 x  16 x1024
  89 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
  90 conv    512       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.134 BF
  91 conv    512       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x 512 1.208 BF
  92 Shortcut Layer: 89,  wt = 0, wn = 0, outputs:  16 x  16 x 512 0.000 BF
  93 conv    512       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.134 BF
  94 conv    512       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x 512 1.208 BF
  95 Shortcut Layer: 92,  wt = 0, wn = 0, outputs:  16 x  16 x 512 0.000 BF
  96 conv    512       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.134 BF
  97 conv    512       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x 512 1.208 BF
  98 Shortcut Layer: 95,  wt = 0, wn = 0, outputs:  16 x  16 x 512 0.000 BF
  99 conv    512       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.134 BF
 100 conv    512       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x 512 1.208 BF
 101 Shortcut Layer: 98,  wt = 0, wn = 0, outputs:  16 x  16 x 512 0.000 BF
 102 conv    512       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.134 BF
 103 route  102 87                                 ->   16 x  16 x1024
 104 conv   1024       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x1024 0.537 BF
 105 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 106 conv   1024       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x1024 2.416 BF
 107 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 108 max                5x 5/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.003 BF
 109 route  107                                            ->   16 x  16 x 512
 110 max                9x 9/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.011 BF
 111 route  107                                            ->   16 x  16 x 512
 112 max               13x13/ 1     16 x  16 x 512 ->   16 x  16 x 512 0.022 BF
 113 route  112 110 108 107                        ->   16 x  16 x2048
 114 conv    512       1 x 1/ 1     16 x  16 x2048 ->   16 x  16 x 512 0.537 BF
 115 conv   1024       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x1024 2.416 BF
 116 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 117 conv    256       1 x 1/ 1     16 x  16 x 512 ->   16 x  16 x 256 0.067 BF
 118 upsample                 2x    16 x  16 x 256 ->   32 x  32 x 256
 119 route  85                                     ->   32 x  32 x 512
 120 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 121 route  120 118                                ->   32 x  32 x 512
 122 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 123 conv    512       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 512 2.416 BF
 124 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 125 conv    512       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 512 2.416 BF
 126 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 127 conv    128       1 x 1/ 1     32 x  32 x 256 ->   32 x  32 x 128 0.067 BF
 128 upsample                 2x    32 x  32 x 128 ->   64 x  64 x 128
 129 route  54                                     ->   64 x  64 x 256
 130 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
 131 route  130 128                                ->   64 x  64 x 256
 132 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
 133 conv    256       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 256 2.416 BF
 134 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
 135 conv    256       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 256 2.416 BF
 136 conv    128       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 128 0.268 BF
 137 conv    256       3 x 3/ 1     64 x  64 x 128 ->   64 x  64 x 256 2.416 BF
 138 conv    255       1 x 1/ 1     64 x  64 x 256 ->   64 x  64 x 255 0.535 BF
 139 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.20
nms_kind: greedynms (1), beta = 0.600000
 140 route  136                                            ->   64 x  64 x 128
 141 conv    256       3 x 3/ 2     64 x  64 x 128 ->   32 x  32 x 256 0.604 BF
 142 route  141 126                                ->   32 x  32 x 512
 143 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 144 conv    512       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 512 2.416 BF
 145 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 146 conv    512       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 512 2.416 BF
 147 conv    256       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 256 0.268 BF
 148 conv    512       3 x 3/ 1     32 x  32 x 256 ->   32 x  32 x 512 2.416 BF
 149 conv    255       1 x 1/ 1     32 x  32 x 512 ->   32 x  32 x 255 0.267 BF
 150 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.10
nms_kind: greedynms (1), beta = 0.600000
 151 route  147                                            ->   32 x  32 x 256
 152 conv    512       3 x 3/ 2     32 x  32 x 256 ->   16 x  16 x 512 0.604 BF
 153 route  152 116                                ->   16 x  16 x1024
 154 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 155 conv   1024       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x1024 2.416 BF
 156 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 157 conv   1024       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x1024 2.416 BF
 158 conv    512       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 512 0.268 BF
 159 conv   1024       3 x 3/ 1     16 x  16 x 512 ->   16 x  16 x1024 2.416 BF
 160 conv    255       1 x 1/ 1     16 x  16 x1024 ->   16 x  16 x 255 0.134 BF
 161 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 91.095
avg_outputs = 757643
 Allocate additional workspace_size = 52.43 MB
Loading weights from csdarknet53-omega.conv.105...
 seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 105 layers from weights-file
Learning Rate: 0.00261, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40

 736 x 736
 Create 12 permanent cpu-threads
 Try to set subdivisions=64 in your cfg-file.
CUDA status Error: file: ./src/dark_cuda.c : () : line: 373 : build time: Apr 24 2020 - 13:58:10

 CUDA Error: out of memory
CUDA Error: out of memory: File exists
darknet: ./src/utils.c:325: error: Assertion `0' failed.
中止 (コアダンプ)

And my yolov4.cfg is (subdivision=32)

$ git diff cfg/yolov4.cfg
diff --git a/cfg/yolov4.cfg b/cfg/yolov4.cfg
index 47b9db6..9e1609b 100644
--- a/cfg/yolov4.cfg
+++ b/cfg/yolov4.cfg
@@ -1,11 +1,11 @@
 [net]
 batch=64
-subdivisions=8
+subdivisions=32
 # Training
-#width=512
-#height=512
-width=608
-height=608
+width=512
+height=512
+#width=608
+#height=608
 channels=3
 momentum=0.949
 decay=0.0005
$
$ cat cfg/yolov4.cfg
[net]
batch=64
subdivisions=16
# Training
width=512
height=512
#width=608
#height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.00261
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1

#cutmix=1
mosaic=1

#:104x104 54:52x52 85:26x26 104:13x13 for 416

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-7

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-10

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-28

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-28

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

# Downsample

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -2

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish

[shortcut]
from=-3
activation=linear

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish

[route]
layers = -1,-16

[convolutional]
batch_normalize=1
filters=1024
size=1
stride=1
pad=1
activation=mish

##########################

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

### SPP ###
[maxpool]
stride=1
size=5

[route]
layers=-2

[maxpool]
stride=1
size=9

[route]
layers=-4

[maxpool]
stride=1
size=13

[route]
layers=-1,-3,-5,-6
### End SPP ###

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = 85

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = 54

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[route]
layers = -1, -3

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

##########################

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=256
activation=leaky

[route]
layers = -1, -16

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

[route]
layers = -4

[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=512
activation=leaky

[route]
layers = -1, -37

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear

[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6

AlexeyAB / darknet

I can not decrease error rate in my training. #5367