Open sounansu opened 4 years ago
@sounansu Hi,
What dataset do you use?
@AlexeyAB Sorry, I forgot.
I trace your Training and Evaluation of speed and accuracy on MS COCO training environment.
So, I already had MSCOCO dataset, but I get MSCOCO dataset with YourScript again.
I modify your coco.data
as below.
$ cat cfg/coco.data
classes= 80
train = /image_data/coco/trainvalno5k.txt
valid = /image_data/coco/testdev2017.txt
#valid = data/coco_val_5k.list
names = data/coco.names
backup = ./backup
eval=coco
avg loss for datasets like MS COCO can be very high.
What GPU do you use?
Do you use batch=64 subdivisions=32
?
Hi @AlexeyAB ! I show my yolov4.cfg.
$ head -15 anaconda3/darknet/cfg/yolov4.cfg
[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=32
width=608
height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
and nvidia-smi -L
$ nvidia-smi -L
GPU 0: GeForce GTX 1080 Ti (UUID: GPU-xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx)
I trace your Training and Evaluation of speed and accuracy on MS COCO training environment.
https://github.com/AlexeyAB/darknet/wiki/Train-Detector-on-MS-COCO-(trainvalno5k-2014)-dataset
Required files: yolov4.cfg - https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
(change width=512 height=512 in cfg-file)
with 608x608 batch=64 subdivisiosn=32
you should multiply 1,1875 x
all anchors in all 3 [yolo] layers, training will be longer, but you will get higher AP
with 512x512 batch=64 subdivisiosn=8
training will be faster, and you shouldn't change anchors, you will get AP from the paper
@AlexeyAB ! Thank you for your advice.
I retried with 512x512 batch=64 subdivision=32
.
With subdivision=8 caused out of memory!
I decreased avg loss
until 6991 iterations.
But at 6992 iteration time, avg loss show negative value.
I show log below
$ grep avg train.log |egrep "^ 698|^ 699"
698: 14.148299, 19.743196 avg loss, 0.000620 rate, 3.589678 seconds, 44672 images, 944.396128 hours left
699: 22.595047, 20.028381 avg loss, 0.000623 rate, 3.776269 seconds, 44736 images, 939.935900 hours left
6980: 12.387778, 15.608065 avg loss, 0.002610 rate, 6.991823 seconds, 446720 images, 941.920228 hours left
6981: 15.596233, 15.606881 avg loss, 0.002610 rate, 8.542204 seconds, 446784 images, 942.086081 hours left
6982: 18.623497, 15.908543 avg loss, 0.002610 rate, 8.704581 seconds, 446848 images, 944.375660 hours left
6983: 17.010046, 16.018692 avg loss, 0.002610 rate, 8.752662 seconds, 446912 images, 946.864913 hours left
6984: 13.561173, 15.772940 avg loss, 0.002610 rate, 8.409437 seconds, 446976 images, 949.395163 hours left
6985: 18.364819, 16.032127 avg loss, 0.002610 rate, 8.654661 seconds, 447040 images, 951.429565 hours left
6986: 16.681313, 16.097046 avg loss, 0.002610 rate, 8.626774 seconds, 447104 images, 953.779786 hours left
6987: 13.151031, 15.802444 avg loss, 0.002610 rate, 8.386545 seconds, 447168 images, 956.068238 hours left
6988: 18.078333, 16.030033 avg loss, 0.002610 rate, 8.676926 seconds, 447232 images, 958.004476 hours left
6989: 10.161444, 15.443174 avg loss, 0.002610 rate, 8.228171 seconds, 447296 images, 960.319389 hours left
6990: 15.453902, 15.444247 avg loss, 0.002610 rate, 8.526408 seconds, 447360 images, 961.995948 hours left
6991: 20.952736, 15.995096 avg loss, 0.002610 rate, 5.238075 seconds, 447424 images, 964.064559 hours left
6992: -3374.463867, -323.050812 avg loss, 0.002610 rate, 5.110243 seconds, 447488 images, 961.635880 hours left
6993: 1348.962524, 1348.962524 avg loss, 0.002610 rate, 5.209830 seconds, 447552 images, 959.024992 hours left
6994: 919.789978, 1306.045288 avg loss, 0.002610 rate, 5.249565 seconds, 447616 images, 956.576715 hours left
6995: -2189.557617, 956.484985 avg loss, 0.002610 rate, 5.153039v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: -0.170633), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 9, class_loss = 8.999999, iou_loss = 0.000009, total_loss = 9.000008
6996: 1344.219727, 995.258484 avg loss, 0.002610 rate, 5.148190 seconds, 447744 images, 951.729398 hours left
6997: 3496.794434, 1245.412109 avg loss, 0.002610 rate, 5.301418 seconds, 447808 images, 949.269531 hours left
6998: 44.709747, 1125.341919 avg loss, 0.002610 rate, 5.254768 seconds, 447872 images, 947.044303 hours left
6999: -3274.613770, 685.346375 avg loss, 0.002610 rate, 5.075725 seconds, 447936 images, 944.777371 hours left
I am very surprised! What happen in my training!
I retried with 512x512 batch=64 subdivision=32. With subdivision=8 caused out of memory!
Tain with 512x512 batch=64 subdivision=16
Do you train by using only 1 GPU? What command do you use for training? Attach your cfg-file.
@AlexeyAB ! I only use single GTX1080i
Tain with 512x512 batch=64 subdivision=16 Cause core dump,too.
$ head cfg/yolov4.cfg
[net]
batch=64
subdivisions=16
# Training
width=512
height=512
#width=608
#height=608
channels=3
momentum=0.949
$
$ ./darknet detector train cfg/coco.data cfg/yolov4.cfg csdarknet53-omega.conv.105 -dont_show
CUDA-version: 10000 (10020), cuDNN: 7.6.5, GPU count: 1
OpenCV version: 4.1.1
yolov4
compute_capability = 610, cudnn_half = 0
net.optimized_memory = 0
mini_batch = 4, batch = 64, time_steps = 1, train = 1
layer filters size/strd(dil) input output
0 conv 32 3 x 3/ 1 512 x 512 x 3 -> 512 x 512 x 32 0.453 BF
1 conv 64 3 x 3/ 2 512 x 512 x 32 -> 256 x 256 x 64 2.416 BF
2 conv 64 1 x 1/ 1 256 x 256 x 64 -> 256 x 256 x 64 0.537 BF
3 route 1 -> 256 x 256 x 64
4 conv 64 1 x 1/ 1 256 x 256 x 64 -> 256 x 256 x 64 0.537 BF
5 conv 32 1 x 1/ 1 256 x 256 x 64 -> 256 x 256 x 32 0.268 BF
6 conv 64 3 x 3/ 1 256 x 256 x 32 -> 256 x 256 x 64 2.416 BF
7 Shortcut Layer: 4, wt = 0, wn = 0, outputs: 256 x 256 x 64 0.004 BF
8 conv 64 1 x 1/ 1 256 x 256 x 64 -> 256 x 256 x 64 0.537 BF
9 route 8 2 -> 256 x 256 x 128
10 conv 64 1 x 1/ 1 256 x 256 x 128 -> 256 x 256 x 64 1.074 BF
11 conv 128 3 x 3/ 2 256 x 256 x 64 -> 128 x 128 x 128 2.416 BF
12 conv 64 1 x 1/ 1 128 x 128 x 128 -> 128 x 128 x 64 0.268 BF
13 route 11 -> 128 x 128 x 128
14 conv 64 1 x 1/ 1 128 x 128 x 128 -> 128 x 128 x 64 0.268 BF
15 conv 64 1 x 1/ 1 128 x 128 x 64 -> 128 x 128 x 64 0.134 BF
16 conv 64 3 x 3/ 1 128 x 128 x 64 -> 128 x 128 x 64 1.208 BF
17 Shortcut Layer: 14, wt = 0, wn = 0, outputs: 128 x 128 x 64 0.001 BF
18 conv 64 1 x 1/ 1 128 x 128 x 64 -> 128 x 128 x 64 0.134 BF
19 conv 64 3 x 3/ 1 128 x 128 x 64 -> 128 x 128 x 64 1.208 BF
20 Shortcut Layer: 17, wt = 0, wn = 0, outputs: 128 x 128 x 64 0.001 BF
21 conv 64 1 x 1/ 1 128 x 128 x 64 -> 128 x 128 x 64 0.134 BF
22 route 21 12 -> 128 x 128 x 128
23 conv 128 1 x 1/ 1 128 x 128 x 128 -> 128 x 128 x 128 0.537 BF
24 conv 256 3 x 3/ 2 128 x 128 x 128 -> 64 x 64 x 256 2.416 BF
25 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
26 route 24 -> 64 x 64 x 256
27 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
28 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
29 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
30 Shortcut Layer: 27, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
31 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
32 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
33 Shortcut Layer: 30, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
34 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
35 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
36 Shortcut Layer: 33, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
37 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
38 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
39 Shortcut Layer: 36, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
40 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
41 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
42 Shortcut Layer: 39, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
43 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
44 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
45 Shortcut Layer: 42, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
46 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
47 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
48 Shortcut Layer: 45, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
49 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
50 conv 128 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 128 1.208 BF
51 Shortcut Layer: 48, wt = 0, wn = 0, outputs: 64 x 64 x 128 0.001 BF
52 conv 128 1 x 1/ 1 64 x 64 x 128 -> 64 x 64 x 128 0.134 BF
53 route 52 25 -> 64 x 64 x 256
54 conv 256 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 256 0.537 BF
55 conv 512 3 x 3/ 2 64 x 64 x 256 -> 32 x 32 x 512 2.416 BF
56 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
57 route 55 -> 32 x 32 x 512
58 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
59 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
60 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
61 Shortcut Layer: 58, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
62 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
63 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
64 Shortcut Layer: 61, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
65 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
66 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
67 Shortcut Layer: 64, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
68 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
69 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
70 Shortcut Layer: 67, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
71 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
72 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
73 Shortcut Layer: 70, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
74 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
75 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
76 Shortcut Layer: 73, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
77 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
78 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
79 Shortcut Layer: 76, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
80 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
81 conv 256 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 256 1.208 BF
82 Shortcut Layer: 79, wt = 0, wn = 0, outputs: 32 x 32 x 256 0.000 BF
83 conv 256 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 256 0.134 BF
84 route 83 56 -> 32 x 32 x 512
85 conv 512 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 512 0.537 BF
86 conv 1024 3 x 3/ 2 32 x 32 x 512 -> 16 x 16 x1024 2.416 BF
87 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
88 route 86 -> 16 x 16 x1024
89 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
90 conv 512 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.134 BF
91 conv 512 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x 512 1.208 BF
92 Shortcut Layer: 89, wt = 0, wn = 0, outputs: 16 x 16 x 512 0.000 BF
93 conv 512 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.134 BF
94 conv 512 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x 512 1.208 BF
95 Shortcut Layer: 92, wt = 0, wn = 0, outputs: 16 x 16 x 512 0.000 BF
96 conv 512 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.134 BF
97 conv 512 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x 512 1.208 BF
98 Shortcut Layer: 95, wt = 0, wn = 0, outputs: 16 x 16 x 512 0.000 BF
99 conv 512 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.134 BF
100 conv 512 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x 512 1.208 BF
101 Shortcut Layer: 98, wt = 0, wn = 0, outputs: 16 x 16 x 512 0.000 BF
102 conv 512 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.134 BF
103 route 102 87 -> 16 x 16 x1024
104 conv 1024 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x1024 0.537 BF
105 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
106 conv 1024 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x1024 2.416 BF
107 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
108 max 5x 5/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.003 BF
109 route 107 -> 16 x 16 x 512
110 max 9x 9/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.011 BF
111 route 107 -> 16 x 16 x 512
112 max 13x13/ 1 16 x 16 x 512 -> 16 x 16 x 512 0.022 BF
113 route 112 110 108 107 -> 16 x 16 x2048
114 conv 512 1 x 1/ 1 16 x 16 x2048 -> 16 x 16 x 512 0.537 BF
115 conv 1024 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x1024 2.416 BF
116 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
117 conv 256 1 x 1/ 1 16 x 16 x 512 -> 16 x 16 x 256 0.067 BF
118 upsample 2x 16 x 16 x 256 -> 32 x 32 x 256
119 route 85 -> 32 x 32 x 512
120 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
121 route 120 118 -> 32 x 32 x 512
122 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
123 conv 512 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 512 2.416 BF
124 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
125 conv 512 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 512 2.416 BF
126 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
127 conv 128 1 x 1/ 1 32 x 32 x 256 -> 32 x 32 x 128 0.067 BF
128 upsample 2x 32 x 32 x 128 -> 64 x 64 x 128
129 route 54 -> 64 x 64 x 256
130 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
131 route 130 128 -> 64 x 64 x 256
132 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
133 conv 256 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 256 2.416 BF
134 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
135 conv 256 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 256 2.416 BF
136 conv 128 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 128 0.268 BF
137 conv 256 3 x 3/ 1 64 x 64 x 128 -> 64 x 64 x 256 2.416 BF
138 conv 255 1 x 1/ 1 64 x 64 x 256 -> 64 x 64 x 255 0.535 BF
139 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.20
nms_kind: greedynms (1), beta = 0.600000
140 route 136 -> 64 x 64 x 128
141 conv 256 3 x 3/ 2 64 x 64 x 128 -> 32 x 32 x 256 0.604 BF
142 route 141 126 -> 32 x 32 x 512
143 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
144 conv 512 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 512 2.416 BF
145 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
146 conv 512 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 512 2.416 BF
147 conv 256 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 256 0.268 BF
148 conv 512 3 x 3/ 1 32 x 32 x 256 -> 32 x 32 x 512 2.416 BF
149 conv 255 1 x 1/ 1 32 x 32 x 512 -> 32 x 32 x 255 0.267 BF
150 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.10
nms_kind: greedynms (1), beta = 0.600000
151 route 147 -> 32 x 32 x 256
152 conv 512 3 x 3/ 2 32 x 32 x 256 -> 16 x 16 x 512 0.604 BF
153 route 152 116 -> 16 x 16 x1024
154 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
155 conv 1024 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x1024 2.416 BF
156 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
157 conv 1024 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x1024 2.416 BF
158 conv 512 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 512 0.268 BF
159 conv 1024 3 x 3/ 1 16 x 16 x 512 -> 16 x 16 x1024 2.416 BF
160 conv 255 1 x 1/ 1 16 x 16 x1024 -> 16 x 16 x 255 0.134 BF
161 yolo
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 91.095
avg_outputs = 757643
Allocate additional workspace_size = 52.43 MB
Loading weights from csdarknet53-omega.conv.105...
seen 64, trained: 0 K-images (0 Kilo-batches_64)
Done! Loaded 105 layers from weights-file
Learning Rate: 0.00261, Momentum: 0.949, Decay: 0.0005
Resizing, random_coef = 1.40
736 x 736
Create 12 permanent cpu-threads
Try to set subdivisions=64 in your cfg-file.
CUDA status Error: file: ./src/dark_cuda.c : () : line: 373 : build time: Apr 24 2020 - 13:58:10
CUDA Error: out of memory
CUDA Error: out of memory: File exists
darknet: ./src/utils.c:325: error: Assertion `0' failed.
中止 (コアダンプ)
And my yolov4.cfg is (subdivision=32)
$ git diff cfg/yolov4.cfg
diff --git a/cfg/yolov4.cfg b/cfg/yolov4.cfg
index 47b9db6..9e1609b 100644
--- a/cfg/yolov4.cfg
+++ b/cfg/yolov4.cfg
@@ -1,11 +1,11 @@
[net]
batch=64
-subdivisions=8
+subdivisions=32
# Training
-#width=512
-#height=512
-width=608
-height=608
+width=512
+height=512
+#width=608
+#height=608
channels=3
momentum=0.949
decay=0.0005
$
$ cat cfg/yolov4.cfg
[net]
batch=64
subdivisions=16
# Training
width=512
height=512
#width=608
#height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.00261
burn_in=1000
max_batches = 500500
policy=steps
steps=400000,450000
scales=.1,.1
#cutmix=1
mosaic=1
#:104x104 54:52x52 85:26x26 104:13x13 for 416
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=mish
# Downsample
[convolutional]
batch_normalize=1
filters=64
size=3
stride=2
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route]
layers = -2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route]
layers = -1,-7
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
# Downsample
[convolutional]
batch_normalize=1
filters=128
size=3
stride=2
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route]
layers = -2
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=64
size=1
stride=1
pad=1
activation=mish
[route]
layers = -1,-10
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
# Downsample
[convolutional]
batch_normalize=1
filters=256
size=3
stride=2
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[route]
layers = -2
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=mish
[route]
layers = -1,-28
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
# Downsample
[convolutional]
batch_normalize=1
filters=512
size=3
stride=2
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[route]
layers = -2
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=mish
[route]
layers = -1,-28
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
# Downsample
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=2
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[route]
layers = -2
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=mish
[shortcut]
from=-3
activation=linear
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=mish
[route]
layers = -1,-16
[convolutional]
batch_normalize=1
filters=1024
size=1
stride=1
pad=1
activation=mish
##########################
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
### SPP ###
[maxpool]
stride=1
size=5
[route]
layers=-2
[maxpool]
stride=1
size=9
[route]
layers=-4
[maxpool]
stride=1
size=13
[route]
layers=-1,-3,-5,-6
### End SPP ###
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = 85
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[route]
layers = -1, -3
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = 54
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[route]
layers = -1, -3
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
##########################
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=256
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 0,1,2
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.2
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6
[route]
layers = -4
[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=256
activation=leaky
[route]
layers = -1, -16
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 3,4,5
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6
[route]
layers = -4
[convolutional]
batch_normalize=1
size=3
stride=2
pad=1
filters=512
activation=leaky
[route]
layers = -1, -37
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=1024
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=255
activation=linear
[yolo]
mask = 6,7,8
anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401
classes=80
num=9
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
scale_x_y = 1.05
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07
iou_loss=ciou
nms_kind=greedynms
beta_nms=0.6
Hi @AlexeyAB ! Long time no see.
Now, I am trying your Yolo V4. But, my training log are not good.
I show front of log and middle of log.
I am training just now.
22.057234 avg loss is not good. I think so. I start with this command.
And I make darknet with this Makefile.
Please suggest to my training !
Thank you!