Open toplinuxsir opened 4 years ago
csresnext50-panet-spp-optimal have higher AP
@AlexeyAB Thanks ! How to train with csresnext50-panet-spp-optimal , The step was same as yolov3? where to download the conv file and cfg file ?
How to train with csresnext50-panet-spp-optimal , The step was same as yolov3?
Yes.
cfg file is in the cfg-directory: https://github.com/AlexeyAB/darknet/blob/master/cfg/csresnext50-panet-spp-original-optimal.cfg
Read: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Download pre-trained weights for the convolutional layers and put to the directory build\darknet\x64
for csresnext50-panet-spp.cfg (133 MB): csresnext50-panet-spp.conv.112 for yolov3.cfg, yolov3-spp.cfg (154 MB): darknet53.conv.74 for yolov3-tiny-prn.cfg , yolov3-tiny.cfg (6 MB): yolov3-tiny.conv.11 for enet-coco.cfg (EfficientNetB0-Yolov3) (14 MB): enetb0-coco.conv.132
csresnext50-panet-spp-optimal.cfg
just has more optimal hyperparameters than csresnext50-panet-spp
@AlexeyAB i got error
./darknet detector train file/object.data file/csresnext50-panet-spp-original-optimal.cfg
CUDA-version: 10010 (10010), cuDNN: 7.5.0, CUDNN_HALF=1, GPU count: 1
OpenCV isn't used
csresnext50-panet-spp-original-optimal
compute_capability = 700, cudnn_half = 1
net.optimized_memory = 0
batch = 4, time_steps = 1, train = 1
layer filters size/strd(dil) input output
0 conv 64 7 x 7/ 2 736 x 736 x 3 -> 368 x 368 x 64 2.548 BF
1 max 2x 2/ 2 368 x 368 x 64 -> 184 x 184 x 64 0.009 BF
2 conv 128 1 x 1/ 1 184 x 184 x 64 -> 184 x 184 x 128 0.555 BF
3 route 1 -> 184 x 184 x 64
4 conv 64 1 x 1/ 1 184 x 184 x 64 -> 184 x 184 x 64 0.277 BF
5 conv 128 1 x 1/ 1 184 x 184 x 64 -> 184 x 184 x 128 0.555 BF
6 conv 128/ 32 3 x 3/ 1 184 x 184 x 128 -> 184 x 184 x 128 0.312 BF
7 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
8 Shortcut Layer: 4, wt = 0, outputs: 184 x 184 x 128 0.004 BF
( 184 x 184 x 128) + ( 184 x 184 x 64)
9 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
10 conv 128/ 32 3 x 3/ 1 184 x 184 x 128 -> 184 x 184 x 128 0.312 BF
11 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
12 Shortcut Layer: 8, wt = 0, outputs: 184 x 184 x 128 0.004 BF
13 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
14 conv 128/ 32 3 x 3/ 1 184 x 184 x 128 -> 184 x 184 x 128 0.312 BF
15 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
16 Shortcut Layer: 12, wt = 0, outputs: 184 x 184 x 128 0.004 BF
17 conv 128 1 x 1/ 1 184 x 184 x 128 -> 184 x 184 x 128 1.109 BF
18 route 17 2 -> 184 x 184 x 256
19 conv 256 1 x 1/ 1 184 x 184 x 256 -> 184 x 184 x 256 4.438 BF
20 conv 256/ 32 3 x 3/ 2 184 x 184 x 256 -> 92 x 92 x 256 0.312 BF
21 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
22 route 20 -> 92 x 92 x 256
23 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
24 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
25 conv 256/ 32 3 x 3/ 1 92 x 92 x 256 -> 92 x 92 x 256 0.312 BF
26 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
27 Shortcut Layer: 23, wt = 0, outputs: 92 x 92 x 256 0.002 BF
28 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
29 conv 256/ 32 3 x 3/ 1 92 x 92 x 256 -> 92 x 92 x 256 0.312 BF
30 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
31 Shortcut Layer: 27, wt = 0, outputs: 92 x 92 x 256 0.002 BF
32 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
33 conv 256/ 32 3 x 3/ 1 92 x 92 x 256 -> 92 x 92 x 256 0.312 BF
34 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
35 Shortcut Layer: 31, wt = 0, outputs: 92 x 92 x 256 0.002 BF
36 conv 256 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 256 1.109 BF
37 route 36 21 -> 92 x 92 x 512
38 conv 512 1 x 1/ 1 92 x 92 x 512 -> 92 x 92 x 512 4.438 BF
39 conv 512/ 32 3 x 3/ 2 92 x 92 x 512 -> 46 x 46 x 512 0.312 BF
40 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
41 route 39 -> 46 x 46 x 512
42 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
43 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
44 conv 512/ 32 3 x 3/ 1 46 x 46 x 512 -> 46 x 46 x 512 0.312 BF
45 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
46 Shortcut Layer: 42, wt = 0, outputs: 46 x 46 x 512 0.001 BF
47 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
48 conv 512/ 32 3 x 3/ 1 46 x 46 x 512 -> 46 x 46 x 512 0.312 BF
49 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
50 Shortcut Layer: 46, wt = 0, outputs: 46 x 46 x 512 0.001 BF
51 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
52 conv 512/ 32 3 x 3/ 1 46 x 46 x 512 -> 46 x 46 x 512 0.312 BF
53 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
54 Shortcut Layer: 50, wt = 0, outputs: 46 x 46 x 512 0.001 BF
55 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
56 conv 512/ 32 3 x 3/ 1 46 x 46 x 512 -> 46 x 46 x 512 0.312 BF
57 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
58 Shortcut Layer: 54, wt = 0, outputs: 46 x 46 x 512 0.001 BF
59 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
60 conv 512/ 32 3 x 3/ 1 46 x 46 x 512 -> 46 x 46 x 512 0.312 BF
61 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
62 Shortcut Layer: 58, wt = 0, outputs: 46 x 46 x 512 0.001 BF
63 conv 512 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 512 1.109 BF
64 route 63 40 -> 46 x 46 x1024
65 conv 1024 1 x 1/ 1 46 x 46 x1024 -> 46 x 46 x1024 4.438 BF
66 conv 1024/ 32 3 x 3/ 2 46 x 46 x1024 -> 23 x 23 x1024 0.312 BF
67 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
68 route 66 -> 23 x 23 x1024
69 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
70 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
71 conv 1024/ 32 3 x 3/ 1 23 x 23 x1024 -> 23 x 23 x1024 0.312 BF
72 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
73 Shortcut Layer: 69, wt = 0, outputs: 23 x 23 x1024 0.001 BF
74 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
75 conv 1024/ 32 3 x 3/ 1 23 x 23 x1024 -> 23 x 23 x1024 0.312 BF
76 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
77 Shortcut Layer: 73, wt = 0, outputs: 23 x 23 x1024 0.001 BF
78 conv 1024 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x1024 1.109 BF
79 route 78 67 -> 23 x 23 x2048
80 conv 2048 1 x 1/ 1 23 x 23 x2048 -> 23 x 23 x2048 4.438 BF
81 conv 512 1 x 1/ 1 23 x 23 x2048 -> 23 x 23 x 512 1.109 BF
82 conv 1024 3 x 3/ 1 23 x 23 x 512 -> 23 x 23 x1024 4.992 BF
83 conv 512 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 512 0.555 BF
84 max 5x 5/ 1 23 x 23 x 512 -> 23 x 23 x 512 0.007 BF
85 route 83 -> 23 x 23 x 512
86 max 9x 9/ 1 23 x 23 x 512 -> 23 x 23 x 512 0.022 BF
87 route 83 -> 23 x 23 x 512
88 max 13x13/ 1 23 x 23 x 512 -> 23 x 23 x 512 0.046 BF
89 route 88 86 84 83 -> 23 x 23 x2048
90 conv 512 1 x 1/ 1 23 x 23 x2048 -> 23 x 23 x 512 1.109 BF
91 conv 1024 3 x 3/ 1 23 x 23 x 512 -> 23 x 23 x1024 4.992 BF
92 conv 512 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 512 0.555 BF
93 conv 256 1 x 1/ 1 23 x 23 x 512 -> 23 x 23 x 256 0.139 BF
94 upsample 2x 23 x 23 x 256 -> 46 x 46 x 256
95 route 65 -> 46 x 46 x1024
96 conv 256 1 x 1/ 1 46 x 46 x1024 -> 46 x 46 x 256 1.109 BF
97 route 96 94 -> 46 x 46 x 512
98 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
99 conv 512 3 x 3/ 1 46 x 46 x 256 -> 46 x 46 x 512 4.992 BF
100 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
101 conv 512 3 x 3/ 1 46 x 46 x 256 -> 46 x 46 x 512 4.992 BF
102 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
103 conv 128 1 x 1/ 1 46 x 46 x 256 -> 46 x 46 x 128 0.139 BF
104 upsample 2x 46 x 46 x 128 -> 92 x 92 x 128
105 route 38 -> 92 x 92 x 512
106 conv 128 1 x 1/ 1 92 x 92 x 512 -> 92 x 92 x 128 1.109 BF
107 route 106 104 -> 92 x 92 x 256
108 conv 128 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 128 0.555 BF
109 conv 256 3 x 3/ 1 92 x 92 x 128 -> 92 x 92 x 256 4.992 BF
110 conv 128 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 128 0.555 BF
111 conv 256 3 x 3/ 1 92 x 92 x 128 -> 92 x 92 x 256 4.992 BF
112 conv 128 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 128 0.555 BF
113 conv 256 3 x 3/ 1 92 x 92 x 128 -> 92 x 92 x 256 4.992 BF
114 conv 36 1 x 1/ 1 92 x 92 x 256 -> 92 x 92 x 36 0.156 BF
115 nms_kind: greedynms (1), beta = 0.600000
[Gaussian_yolo] iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale: 1.20, point: 1
Unused field: 'beta1 = 0.6'
116 route 112 -> 92 x 92 x 128
117 conv 256 3 x 3/ 2 92 x 92 x 128 -> 46 x 46 x 256 1.248 BF
118 route 117 102 -> 46 x 46 x 512
119 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
120 conv 512 3 x 3/ 1 46 x 46 x 256 -> 46 x 46 x 512 4.992 BF
121 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
122 conv 512 3 x 3/ 1 46 x 46 x 256 -> 46 x 46 x 512 4.992 BF
123 conv 256 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 256 0.555 BF
124 conv 512 3 x 3/ 1 46 x 46 x 256 -> 46 x 46 x 512 4.992 BF
125 conv 36 1 x 1/ 1 46 x 46 x 512 -> 46 x 46 x 36 0.078 BF
126 nms_kind: greedynms (1), beta = 0.600000
[Gaussian_yolo] iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale: 1.10, point: 1
Unused field: 'beta1 = 0.6'
127 route 123 -> 46 x 46 x 256
128 conv 512 3 x 3/ 2 46 x 46 x 256 -> 23 x 23 x 512 1.248 BF
129 route 128 92 -> 23 x 23 x1024
130 conv 512 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 512 0.555 BF
131 conv 1024 3 x 3/ 1 23 x 23 x 512 -> 23 x 23 x1024 4.992 BF
132 conv 512 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 512 0.555 BF
133 conv 1024 3 x 3/ 1 23 x 23 x 512 -> 23 x 23 x1024 4.992 BF
134 conv 512 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 512 0.555 BF
135 conv 1024 3 x 3/ 1 23 x 23 x 512 -> 23 x 23 x1024 4.992 BF
136 conv 36 1 x 1/ 1 23 x 23 x1024 -> 23 x 23 x 36 0.039 BF
137 nms_kind: greedynms (1), beta = 0.600000
[Gaussian_yolo] iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale: 1.05, point: 1
Unused field: 'beta1 = 0.6'
Total BFLOPS 145.765
Allocate additional workspace_size = 52.43 MB
Learning Rate: 0.00261, Momentum: 0.949, Decay: 0.0005
Resizing
1056 x 1056
darknet: ./src/data.c:1208: load_data_detection: Assertion use_mixup < 2' failed. darknet: ./src/data.c:1208: load_data_detection: Assertion
use_mixup < 2' failed.
Aborted (core dumped)
i use csresnext50-panet-spp-optimal.cfg + gaussian
Currently mosaic=1
is supported only when Darknet compiled with OPENCV=1
@AlexeyAB Hi,
I trained a model with this pre-trained "csresnext50-panet-spp.conv.112" and I want to start another train with my pre-trained. I use this command to make a pre-trained model:
darknet.exe partial csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights my-csresnext50.112 112
When I start the training with my pre-trained after 500 iterations I got avg nan. where is the problem?
Thanks
@zpmmehrdad Try to set max_delta=10
in 3 [yolo]
layers
@AlexeyAB Hi,
I got this error after about 5k iterations. CUDA status Error: file: ....\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Feb 17 2020 - 17:10:53 CUDA Error: unspecified launch failure CUDA Error Prev: unspecified launch failure
I have updated to the latest version that you released yesterday.
What GPU, CUDA, cuDNN, OpenCV versions do you use? What command do you use? What params do you use in the Make/Cmake?
Hi @AlexeyAB ,
I don't have any problems before. GPU: 2 RTX 2080 Ti, CUDA: 10.2 cuDNN: 7.6 OpenCV: 4.2
Command: darknet.exe detector train a.obj csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights -map -dont_show -gpus 0,1
@zpmmehrdad
I don't have any problems before.
Which commit did you use previously?
@zpmmehrdad
- Or Try to increase TDR (Timeout Detection and Recovery)
How can I do that?
Which commit did you use previously? I don't remember but I think 2 or 3 commit ago.
I stopped the training and start with 1 GPU and no problem now.
@AlexeyAB Hi,
When I use two GPUs this error happend:
cuDNN status Error in: file: ./src/convolutional_kernels.cu : backward_convolutional_layer_gpu() : line: 829 : build time: Feb 17 2020 - 17:10:25 cuDNN Error: CUDNN_STATUS_INTERNAL_ERROR CUDA status Error: file: ....\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Feb 17 2020 - 17:10:53 CUDA Error: unspecified launch failure
What subdivisions do you use in cfg-file?
Do you get this error if you set subdivisions=64
instead of 32
in cfg file?
@AlexeyAB No let me check it but I have enough memroy.
@AlexeyAB Hi,
I got the error again with one GPU and I'm training with subdivisions=64
and two GPUs
@AlexeyAB
the error appears with subdivisions=64
CUDA Error Prev: unspecified launch failure CUDA Error Prev: unspecified launch failure CUDA Error Prev: unspecified launch failure: No error Assertion failed: 0, file ....\src\utils.c, line 325
@zpmmehrdad
I got the error again with one GPU and I'm training with subdivisions=64 and two GPUs
Do you get this error even if you use 1 GPU and subdivisions=64?
Can you show full screenshot of the error?
Do you get this error if you use
darknet.exe detector train a.obj csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights -dont_show
instead of
darknet.exe detector train a.obj csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights -map -dont_show -gpus 0,1
Can you show content of bad.list and bad_label.list files?
After how many iterations do you get this error?
Do you use the latest version of Darknet?
Did you compile Darknet by using Legacy way, Cmake or vcpkg? https://github.com/AlexeyAB/darknet#how-to-compile-on-windows-using-cmake-gui
Can you show output of this cmd file? nvidia-smi.zip
@AlexeyAB Hi
- Do you get this error even if you use 1 GPU and subdivisions=64?
Yes
- Can you show full screenshot of the error?
I have started the training and when I see the error I will share it with you
- Do you get this error if you use
darknet.exe detector train a.obj csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights -dont_show
instead ofdarknet.exe detector train a.obj csresnext50-panet-spp-original-optimal.cfg my-csresnext50-panet-spp-original-optimal.weights -map -dont_show -gpus 0,1
Yes
- Can you show content of bad.list and bad_label.list files?
I don't have any bad.list or bad_label.list
- After how many iterations do you get this error?
Almost 500 or 200 after continue the training.
- Do you use the latest version of Darknet?
Yes I do.
- Did you compile Darknet by using Legacy way, Cmake or vcpkg? https://github.com/AlexeyAB/darknet#how-to-compile-on-windows-using-cmake-gui
I used Cmake.
- Can you show output of this cmd file? nvidia-smi.zip
Yes I can, I'll show that
@zpmmehrdad
Also show screenshots
screenshots of Darknet starting
screenshots from Cmake
@AlexeyAB Hi,
The training with 1 GPU reached 21k iter without any problems and I stopped the training and took a screenshot and continued the training with 2 GPUs.
Do you get this error even if you use 1 GPU and subdivisions=64?
Yes
The training with 1 GPU reached 21k iter without any problems and I stopped the training and took a screenshot and continued the training.
@zpmmehrdad So you don't get this error with 1 GPU?
@AlexeyAB
@zpmmehrdad So you don't get this error with 1 GPU?
Of course I got the error when the iteration less than 2k but I have continued to the training. I started from 21k with 2 GPUs and see, the error appear
@AlexeyAB Hi,
I got the error again.
What about screenshots from Cmake-GUI? https://github.com/AlexeyAB/darknet/issues/4652#issuecomment-589141724
What CUDA compute capability do you use in the Darknet?
@AlexeyAB Hi
What about screenshots from Cmake-GUI? [#4652 (comment)]
I will share it with you (https://github.com/AlexeyAB/darknet/issues/4652#issuecomment-589141724)
What CUDA compute capability do you use in the Darknet?
Nothing, just use csresnext50-panet-spp-original-optimal.cfg for training
@AlexeyAB which conv file to use If I choose csresnext50-panet-spp-original-optimal.cfg to train ? Thanks!
csresnext50-panet-spp.conv.112
@AlexeyAB Thanks very much !
@AlexeyAB
I got the error again.
CUDA status Error: file: ....\src\dark_cuda.c : cuda_push_array() : line: 457 : build time: Feb 17 2020 - 17:10:53 CUDA Error: unspecified launch failure
CUDA Error: unspecified launch failure: No error Assertion failed: 0, file ....\src\utils.c, line 325
@AlexeyAB Hi, recently I'm working on the csresnext-panet-spp, but I can't figure out which part of PANet you used in it through the cfg file, could you help me about this?
@zpmmehrdad Try to set
max_delta=10
in 3[yolo]
layers
what does this param play ? @AlexeyAB
What's the difference between csresnext50-panet-spp and csresnext50-panet-spp-optimal ? Which one would have higher AP?