ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
717 stars 209 forks source link

How sholud I add a new layer? #19

Closed beizhengren closed 4 years ago

beizhengren commented 4 years ago

1.How sholud I add a new layer with tkDNN? 2.How to manualize generating trt model? I use the command

./darknet export <path-to-cfg-file> <path-to-weights> layers
./darknet export <path-to-cfg-file> <path-to-weights> debug

And

./test_yolo3            # run the yolo test (is slow)

auto generate yolo3_fp32.rt with exported bin files in cloud.

ceccocats commented 4 years ago

Hi @beizhengren,

  1. What do you mean? Adding a new kind of layer or adding a new layer in a model?

  2. For example if you use test_yolo3 then the weights will be downloaded from the cloud and put into yolo3 in your build folder. If you want to use your exported weights, override the downloaded weights inside yolo3 folder.

beizhengren commented 4 years ago

Hi @ceccocats, Thanks for your quick reply.

  1. A new kind of layer. Concretely, I have a project that can load tensorrt model or darknet files(cfg, weights) to inference. And there are some new layer tensorrt can not parse, so I want to use tkDNN to help solve it, for example yolov4. That is embedding tkDNN to my own project.

  2. Some “Wrongs” prompted. My work flow as following I use the command

    ./darknet export cfg/yolov4.cfg yolov4.weights layers debug 

    and files of debug folder, as following image

layers folder as following image then run

./test_yolo4   
please click ``` $ ./test_yolo4 New NETWORK (tkDNN v0.4, CUDNN v7.5) Reading weights: I=3 O=32 KERNEL=3x3x1 Reading weights: I=32 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=64 O=32 KERNEL=1x1x1 Reading weights: I=32 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=128 O=64 KERNEL=1x1x1 Reading weights: I=64 O=128 KERNEL=3x3x1 Reading weights: I=128 O=64 KERNEL=1x1x1 Reading weights: I=128 O=64 KERNEL=1x1x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=64 O=64 KERNEL=3x3x1 Reading weights: I=64 O=64 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=256 KERNEL=3x3x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=128 O=128 KERNEL=3x3x1 Reading weights: I=128 O=128 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=256 O=256 KERNEL=3x3x1 Reading weights: I=256 O=256 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=512 O=512 KERNEL=3x3x1 Reading weights: I=512 O=512 KERNEL=1x1x1 Reading weights: I=1024 O=1024 KERNEL=1x1x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=2048 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=128 O=256 KERNEL=3x3x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=128 O=256 KERNEL=3x3x1 Reading weights: I=256 O=128 KERNEL=1x1x1 Reading weights: I=128 O=256 KERNEL=3x3x1 Reading weights: I=256 O=255 KERNEL=1x1x1 Reading weights: I=128 O=256 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=256 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=512 O=255 KERNEL=1x1x1 Reading weights: I=256 O=512 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=512 KERNEL=1x1x1 Reading weights: I=512 O=1024 KERNEL=3x3x1 Reading weights: I=1024 O=255 KERNEL=1x1x1 ====================== NETWORK MODEL ====================== N. Layer type input (H*W,CH) output (H*W,CH) 0 Conv2d 416 x 416, 3 -> 416 x 416, 32 1 ActivationMish 416 x 416, 32 -> 416 x 416, 32 2 Conv2d 416 x 416, 32 -> 208 x 208, 64 3 ActivationMish 208 x 208, 64 -> 208 x 208, 64 4 Conv2d 208 x 208, 64 -> 208 x 208, 64 5 ActivationMish 208 x 208, 64 -> 208 x 208, 64 6 Route 208 x 208, 64 -> 208 x 208, 64 7 Conv2d 208 x 208, 64 -> 208 x 208, 64 8 ActivationMish 208 x 208, 64 -> 208 x 208, 64 9 Conv2d 208 x 208, 64 -> 208 x 208, 32 10 ActivationMish 208 x 208, 32 -> 208 x 208, 32 11 Conv2d 208 x 208, 32 -> 208 x 208, 64 12 ActivationMish 208 x 208, 64 -> 208 x 208, 64 13 Shortcut 208 x 208, 64 -> 208 x 208, 64 14 Conv2d 208 x 208, 64 -> 208 x 208, 64 15 ActivationMish 208 x 208, 64 -> 208 x 208, 64 16 Route 208 x 208, 128 -> 208 x 208, 128 17 Conv2d 208 x 208, 128 -> 208 x 208, 64 18 ActivationMish 208 x 208, 64 -> 208 x 208, 64 19 Conv2d 208 x 208, 64 -> 104 x 104, 128 20 ActivationMish 104 x 104, 128 -> 104 x 104, 128 21 Conv2d 104 x 104, 128 -> 104 x 104, 64 22 ActivationMish 104 x 104, 64 -> 104 x 104, 64 23 Route 104 x 104, 128 -> 104 x 104, 128 24 Conv2d 104 x 104, 128 -> 104 x 104, 64 25 ActivationMish 104 x 104, 64 -> 104 x 104, 64 26 Conv2d 104 x 104, 64 -> 104 x 104, 64 27 ActivationMish 104 x 104, 64 -> 104 x 104, 64 28 Conv2d 104 x 104, 64 -> 104 x 104, 64 29 ActivationMish 104 x 104, 64 -> 104 x 104, 64 30 Shortcut 104 x 104, 64 -> 104 x 104, 64 31 Conv2d 104 x 104, 64 -> 104 x 104, 64 32 ActivationMish 104 x 104, 64 -> 104 x 104, 64 33 Conv2d 104 x 104, 64 -> 104 x 104, 64 34 ActivationMish 104 x 104, 64 -> 104 x 104, 64 35 Shortcut 104 x 104, 64 -> 104 x 104, 64 36 Conv2d 104 x 104, 64 -> 104 x 104, 64 37 ActivationMish 104 x 104, 64 -> 104 x 104, 64 38 Route 104 x 104, 128 -> 104 x 104, 128 39 Conv2d 104 x 104, 128 -> 104 x 104, 128 40 ActivationMish 104 x 104, 128 -> 104 x 104, 128 41 Conv2d 104 x 104, 128 -> 52 x 52, 256 42 ActivationMish 52 x 52, 256 -> 52 x 52, 256 43 Conv2d 52 x 52, 256 -> 52 x 52, 128 44 ActivationMish 52 x 52, 128 -> 52 x 52, 128 45 Route 52 x 52, 256 -> 52 x 52, 256 46 Conv2d 52 x 52, 256 -> 52 x 52, 128 47 ActivationMish 52 x 52, 128 -> 52 x 52, 128 48 Conv2d 52 x 52, 128 -> 52 x 52, 128 49 ActivationMish 52 x 52, 128 -> 52 x 52, 128 50 Conv2d 52 x 52, 128 -> 52 x 52, 128 51 ActivationMish 52 x 52, 128 -> 52 x 52, 128 52 Shortcut 52 x 52, 128 -> 52 x 52, 128 53 Conv2d 52 x 52, 128 -> 52 x 52, 128 54 ActivationMish 52 x 52, 128 -> 52 x 52, 128 55 Conv2d 52 x 52, 128 -> 52 x 52, 128 56 ActivationMish 52 x 52, 128 -> 52 x 52, 128 57 Shortcut 52 x 52, 128 -> 52 x 52, 128 58 Conv2d 52 x 52, 128 -> 52 x 52, 128 59 ActivationMish 52 x 52, 128 -> 52 x 52, 128 60 Conv2d 52 x 52, 128 -> 52 x 52, 128 61 ActivationMish 52 x 52, 128 -> 52 x 52, 128 62 Shortcut 52 x 52, 128 -> 52 x 52, 128 63 Conv2d 52 x 52, 128 -> 52 x 52, 128 64 ActivationMish 52 x 52, 128 -> 52 x 52, 128 65 Conv2d 52 x 52, 128 -> 52 x 52, 128 66 ActivationMish 52 x 52, 128 -> 52 x 52, 128 67 Shortcut 52 x 52, 128 -> 52 x 52, 128 68 Conv2d 52 x 52, 128 -> 52 x 52, 128 69 ActivationMish 52 x 52, 128 -> 52 x 52, 128 70 Conv2d 52 x 52, 128 -> 52 x 52, 128 71 ActivationMish 52 x 52, 128 -> 52 x 52, 128 72 Shortcut 52 x 52, 128 -> 52 x 52, 128 73 Conv2d 52 x 52, 128 -> 52 x 52, 128 74 ActivationMish 52 x 52, 128 -> 52 x 52, 128 75 Conv2d 52 x 52, 128 -> 52 x 52, 128 76 ActivationMish 52 x 52, 128 -> 52 x 52, 128 77 Shortcut 52 x 52, 128 -> 52 x 52, 128 78 Conv2d 52 x 52, 128 -> 52 x 52, 128 79 ActivationMish 52 x 52, 128 -> 52 x 52, 128 80 Conv2d 52 x 52, 128 -> 52 x 52, 128 81 ActivationMish 52 x 52, 128 -> 52 x 52, 128 82 Shortcut 52 x 52, 128 -> 52 x 52, 128 83 Conv2d 52 x 52, 128 -> 52 x 52, 128 84 ActivationMish 52 x 52, 128 -> 52 x 52, 128 85 Conv2d 52 x 52, 128 -> 52 x 52, 128 86 ActivationMish 52 x 52, 128 -> 52 x 52, 128 87 Shortcut 52 x 52, 128 -> 52 x 52, 128 88 Conv2d 52 x 52, 128 -> 52 x 52, 128 89 ActivationMish 52 x 52, 128 -> 52 x 52, 128 90 Route 52 x 52, 256 -> 52 x 52, 256 91 Conv2d 52 x 52, 256 -> 52 x 52, 256 92 ActivationMish 52 x 52, 256 -> 52 x 52, 256 93 Conv2d 52 x 52, 256 -> 26 x 26, 512 94 ActivationMish 26 x 26, 512 -> 26 x 26, 512 95 Conv2d 26 x 26, 512 -> 26 x 26, 256 96 ActivationMish 26 x 26, 256 -> 26 x 26, 256 97 Route 26 x 26, 512 -> 26 x 26, 512 98 Conv2d 26 x 26, 512 -> 26 x 26, 256 99 ActivationMish 26 x 26, 256 -> 26 x 26, 256 100 Conv2d 26 x 26, 256 -> 26 x 26, 256 101 ActivationMish 26 x 26, 256 -> 26 x 26, 256 102 Conv2d 26 x 26, 256 -> 26 x 26, 256 103 ActivationMish 26 x 26, 256 -> 26 x 26, 256 104 Shortcut 26 x 26, 256 -> 26 x 26, 256 105 Conv2d 26 x 26, 256 -> 26 x 26, 256 106 ActivationMish 26 x 26, 256 -> 26 x 26, 256 107 Conv2d 26 x 26, 256 -> 26 x 26, 256 108 ActivationMish 26 x 26, 256 -> 26 x 26, 256 109 Shortcut 26 x 26, 256 -> 26 x 26, 256 110 Conv2d 26 x 26, 256 -> 26 x 26, 256 111 ActivationMish 26 x 26, 256 -> 26 x 26, 256 112 Conv2d 26 x 26, 256 -> 26 x 26, 256 113 ActivationMish 26 x 26, 256 -> 26 x 26, 256 114 Shortcut 26 x 26, 256 -> 26 x 26, 256 115 Conv2d 26 x 26, 256 -> 26 x 26, 256 116 ActivationMish 26 x 26, 256 -> 26 x 26, 256 117 Conv2d 26 x 26, 256 -> 26 x 26, 256 118 ActivationMish 26 x 26, 256 -> 26 x 26, 256 119 Shortcut 26 x 26, 256 -> 26 x 26, 256 120 Conv2d 26 x 26, 256 -> 26 x 26, 256 121 ActivationMish 26 x 26, 256 -> 26 x 26, 256 122 Conv2d 26 x 26, 256 -> 26 x 26, 256 123 ActivationMish 26 x 26, 256 -> 26 x 26, 256 124 Shortcut 26 x 26, 256 -> 26 x 26, 256 125 Conv2d 26 x 26, 256 -> 26 x 26, 256 126 ActivationMish 26 x 26, 256 -> 26 x 26, 256 127 Conv2d 26 x 26, 256 -> 26 x 26, 256 128 ActivationMish 26 x 26, 256 -> 26 x 26, 256 129 Shortcut 26 x 26, 256 -> 26 x 26, 256 130 Conv2d 26 x 26, 256 -> 26 x 26, 256 131 ActivationMish 26 x 26, 256 -> 26 x 26, 256 132 Conv2d 26 x 26, 256 -> 26 x 26, 256 133 ActivationMish 26 x 26, 256 -> 26 x 26, 256 134 Shortcut 26 x 26, 256 -> 26 x 26, 256 135 Conv2d 26 x 26, 256 -> 26 x 26, 256 136 ActivationMish 26 x 26, 256 -> 26 x 26, 256 137 Conv2d 26 x 26, 256 -> 26 x 26, 256 138 ActivationMish 26 x 26, 256 -> 26 x 26, 256 139 Shortcut 26 x 26, 256 -> 26 x 26, 256 140 Conv2d 26 x 26, 256 -> 26 x 26, 256 141 ActivationMish 26 x 26, 256 -> 26 x 26, 256 142 Route 26 x 26, 512 -> 26 x 26, 512 143 Conv2d 26 x 26, 512 -> 26 x 26, 512 144 ActivationMish 26 x 26, 512 -> 26 x 26, 512 145 Conv2d 26 x 26, 512 -> 13 x 13, 1024 146 ActivationMish 13 x 13, 1024 -> 13 x 13, 1024 147 Conv2d 13 x 13, 1024 -> 13 x 13, 512 148 ActivationMish 13 x 13, 512 -> 13 x 13, 512 149 Route 13 x 13, 1024 -> 13 x 13, 1024 150 Conv2d 13 x 13, 1024 -> 13 x 13, 512 151 ActivationMish 13 x 13, 512 -> 13 x 13, 512 152 Conv2d 13 x 13, 512 -> 13 x 13, 512 153 ActivationMish 13 x 13, 512 -> 13 x 13, 512 154 Conv2d 13 x 13, 512 -> 13 x 13, 512 155 ActivationMish 13 x 13, 512 -> 13 x 13, 512 156 Shortcut 13 x 13, 512 -> 13 x 13, 512 157 Conv2d 13 x 13, 512 -> 13 x 13, 512 158 ActivationMish 13 x 13, 512 -> 13 x 13, 512 159 Conv2d 13 x 13, 512 -> 13 x 13, 512 160 ActivationMish 13 x 13, 512 -> 13 x 13, 512 161 Shortcut 13 x 13, 512 -> 13 x 13, 512 162 Conv2d 13 x 13, 512 -> 13 x 13, 512 163 ActivationMish 13 x 13, 512 -> 13 x 13, 512 164 Conv2d 13 x 13, 512 -> 13 x 13, 512 165 ActivationMish 13 x 13, 512 -> 13 x 13, 512 166 Shortcut 13 x 13, 512 -> 13 x 13, 512 167 Conv2d 13 x 13, 512 -> 13 x 13, 512 168 ActivationMish 13 x 13, 512 -> 13 x 13, 512 169 Conv2d 13 x 13, 512 -> 13 x 13, 512 170 ActivationMish 13 x 13, 512 -> 13 x 13, 512 171 Shortcut 13 x 13, 512 -> 13 x 13, 512 172 Conv2d 13 x 13, 512 -> 13 x 13, 512 173 ActivationMish 13 x 13, 512 -> 13 x 13, 512 174 Route 13 x 13, 1024 -> 13 x 13, 1024 175 Conv2d 13 x 13, 1024 -> 13 x 13, 1024 176 ActivationMish 13 x 13, 1024 -> 13 x 13, 1024 177 Conv2d 13 x 13, 1024 -> 13 x 13, 512 178 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 179 Conv2d 13 x 13, 512 -> 13 x 13, 1024 180 ActivationLeaky 13 x 13, 1024 -> 13 x 13, 1024 181 Conv2d 13 x 13, 1024 -> 13 x 13, 512 182 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 183 Pooling 13 x 13, 512 -> 13 x 13, 512 184 Route 13 x 13, 512 -> 13 x 13, 512 185 Pooling 13 x 13, 512 -> 13 x 13, 512 186 Route 13 x 13, 512 -> 13 x 13, 512 187 Pooling 13 x 13, 512 -> 13 x 13, 512 188 Route 13 x 13, 2048 -> 13 x 13, 2048 189 Conv2d 13 x 13, 2048 -> 13 x 13, 512 190 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 191 Conv2d 13 x 13, 512 -> 13 x 13, 1024 192 ActivationLeaky 13 x 13, 1024 -> 13 x 13, 1024 193 Conv2d 13 x 13, 1024 -> 13 x 13, 512 194 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 195 Conv2d 13 x 13, 512 -> 13 x 13, 256 196 ActivationLeaky 13 x 13, 256 -> 13 x 13, 256 197 Upsample 13 x 13, 256 -> 26 x 26, 256 198 Route 26 x 26, 512 -> 26 x 26, 512 199 Conv2d 26 x 26, 512 -> 26 x 26, 256 200 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 201 Route 26 x 26, 512 -> 26 x 26, 512 202 Conv2d 26 x 26, 512 -> 26 x 26, 256 203 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 204 Conv2d 26 x 26, 256 -> 26 x 26, 512 205 ActivationLeaky 26 x 26, 512 -> 26 x 26, 512 206 Conv2d 26 x 26, 512 -> 26 x 26, 256 207 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 208 Conv2d 26 x 26, 256 -> 26 x 26, 512 209 ActivationLeaky 26 x 26, 512 -> 26 x 26, 512 210 Conv2d 26 x 26, 512 -> 26 x 26, 256 211 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 212 Conv2d 26 x 26, 256 -> 26 x 26, 128 213 ActivationLeaky 26 x 26, 128 -> 26 x 26, 128 214 Upsample 26 x 26, 128 -> 52 x 52, 128 215 Route 52 x 52, 256 -> 52 x 52, 256 216 Conv2d 52 x 52, 256 -> 52 x 52, 128 217 ActivationLeaky 52 x 52, 128 -> 52 x 52, 128 218 Route 52 x 52, 256 -> 52 x 52, 256 219 Conv2d 52 x 52, 256 -> 52 x 52, 128 220 ActivationLeaky 52 x 52, 128 -> 52 x 52, 128 221 Conv2d 52 x 52, 128 -> 52 x 52, 256 222 ActivationLeaky 52 x 52, 256 -> 52 x 52, 256 223 Conv2d 52 x 52, 256 -> 52 x 52, 128 224 ActivationLeaky 52 x 52, 128 -> 52 x 52, 128 225 Conv2d 52 x 52, 128 -> 52 x 52, 256 226 ActivationLeaky 52 x 52, 256 -> 52 x 52, 256 227 Conv2d 52 x 52, 256 -> 52 x 52, 128 228 ActivationLeaky 52 x 52, 128 -> 52 x 52, 128 229 Conv2d 52 x 52, 128 -> 52 x 52, 256 230 ActivationLeaky 52 x 52, 256 -> 52 x 52, 256 231 Conv2d 52 x 52, 256 -> 52 x 52, 255 232 Yolo 52 x 52, 255 -> 52 x 52, 255 233 Route 52 x 52, 128 -> 52 x 52, 128 234 Conv2d 52 x 52, 128 -> 26 x 26, 256 235 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 236 Route 26 x 26, 512 -> 26 x 26, 512 237 Conv2d 26 x 26, 512 -> 26 x 26, 256 238 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 239 Conv2d 26 x 26, 256 -> 26 x 26, 512 240 ActivationLeaky 26 x 26, 512 -> 26 x 26, 512 241 Conv2d 26 x 26, 512 -> 26 x 26, 256 242 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 243 Conv2d 26 x 26, 256 -> 26 x 26, 512 244 ActivationLeaky 26 x 26, 512 -> 26 x 26, 512 245 Conv2d 26 x 26, 512 -> 26 x 26, 256 246 ActivationLeaky 26 x 26, 256 -> 26 x 26, 256 247 Conv2d 26 x 26, 256 -> 26 x 26, 512 248 ActivationLeaky 26 x 26, 512 -> 26 x 26, 512 249 Conv2d 26 x 26, 512 -> 26 x 26, 255 250 Yolo 26 x 26, 255 -> 26 x 26, 255 251 Route 26 x 26, 256 -> 26 x 26, 256 252 Conv2d 26 x 26, 256 -> 13 x 13, 512 253 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 254 Route 13 x 13, 1024 -> 13 x 13, 1024 255 Conv2d 13 x 13, 1024 -> 13 x 13, 512 256 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 257 Conv2d 13 x 13, 512 -> 13 x 13, 1024 258 ActivationLeaky 13 x 13, 1024 -> 13 x 13, 1024 259 Conv2d 13 x 13, 1024 -> 13 x 13, 512 260 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 261 Conv2d 13 x 13, 512 -> 13 x 13, 1024 262 ActivationLeaky 13 x 13, 1024 -> 13 x 13, 1024 263 Conv2d 13 x 13, 1024 -> 13 x 13, 512 264 ActivationLeaky 13 x 13, 512 -> 13 x 13, 512 265 Conv2d 13 x 13, 512 -> 13 x 13, 1024 266 ActivationLeaky 13 x 13, 1024 -> 13 x 13, 1024 267 Conv2d 13 x 13, 1024 -> 13 x 13, 255 268 Yolo 13 x 13, 255 -> 13 x 13, 255 =========================================================== New NetworkRT (TensorRT v5.02) Float16 support: 0 Int8 support: 1 DLAs: 0 create execution context Input/outputs numbers: 4 input idex = 0 -> output index = 3 Data dim: 1 3 416 416 1 Data dim: 1 255 13 13 1 RtBuffer 0 dim: Data dim: 1 3 416 416 1 RtBuffer 1 dim: Data dim: 1 255 52 52 1 RtBuffer 2 dim: Data dim: 1 255 26 26 1 RtBuffer 3 dim: Data dim: 1 255 13 13 1 ====== CUDNN inference ====== Data dim: 1 3 416 416 1 Time: 39.4766 ms Data dim: 1 255 13 13 1 ===== compute detections ==== Time: 1.18542 ms ===== TENSORRT inference ==== Data dim: 1 3 416 416 1 Time: 25.2099 ms Data dim: 1 255 13 13 1 ==== YOLO 0 CHECK RESULTS === CUDNN vs correct | [ 0 ]: 0.555169 0.852 | [ 1 ]: 0.451936 0.540374 | [ 2 ]: 0.416771 0.441668 | [ 3 ]: 0.360798 0.587053 | [ 4 ]: 0.496808 0.460219 | [ 6 ]: 0.437813 0.471066 | [ 8 ]: 0.433962 0.560362 | [ 9 ]: 0.46351 0.509252 | [ 10 ]: 0.452068 0.515173 | Wrongs: 166431 ~0.02 TRT vs correct | [ 0 ]: 0.555169 0.852 | [ 1 ]: 0.451937 0.540374 | [ 2 ]: 0.416771 0.441668 | [ 3 ]: 0.360798 0.587053 | [ 4 ]: 0.496809 0.460219 | [ 6 ]: 0.437812 0.471066 | [ 8 ]: 0.433962 0.560362 | [ 9 ]: 0.46351 0.509252 | [ 10 ]: 0.452068 0.515173 | Wrongs: 166431 ~0.02 CUDNN vs TRT | OK ~0.02 ==== YOLO 1 CHECK RESULTS === CUDNN vs correct | [ 0 ]: 0.819532 0.913855 | [ 1 ]: 0.492696 0.17925 | [ 2 ]: 0.383019 0.208085 | [ 3 ]: 0.315248 0.490259 | [ 4 ]: 0.345272 0.554413 | [ 5 ]: 0.413513 0.164243 | [ 7 ]: 0.525169 0.726588 | [ 8 ]: 0.537227 0.485469 | [ 9 ]: 0.540595 0.696012 | Wrongs: 38345 ~0.02 TRT vs correct | [ 0 ]: 0.819532 0.913855 | [ 1 ]: 0.492696 0.17925 | [ 2 ]: 0.383019 0.208085 | [ 3 ]: 0.315248 0.490259 | [ 4 ]: 0.345271 0.554413 | [ 5 ]: 0.413513 0.164243 | [ 7 ]: 0.525168 0.726588 | [ 8 ]: 0.537227 0.485469 | [ 9 ]: 0.540595 0.696012 | Wrongs: 38345 ~0.02 CUDNN vs TRT | OK ~0.02 ==== YOLO 2 CHECK RESULTS === CUDNN vs correct | [ 0 ]: 0.883185 0.662702 | [ 1 ]: 0.537623 0.44748 | [ 2 ]: 0.419132 0.250894 | [ 3 ]: 0.512994 0.843529 | [ 4 ]: 0.394617 0.248704 | [ 5 ]: 0.543686 0.20504 | [ 6 ]: 0.56079 0.85909 | [ 7 ]: 0.362244 0.754667 | [ 8 ]: 0.568176 0.775216 | Wrongs: 9748 ~0.02 TRT vs correct | [ 0 ]: 0.883186 0.662702 | [ 1 ]: 0.537623 0.44748 | [ 2 ]: 0.419132 0.250894 | [ 3 ]: 0.512994 0.843529 | [ 4 ]: 0.394617 0.248704 | [ 5 ]: 0.543685 0.20504 | [ 6 ]: 0.560791 0.85909 | [ 7 ]: 0.362245 0.754667 | [ 8 ]: 0.568176 0.775216 | Wrongs: 9748 ~0.02 ```
ceccocats commented 4 years ago

1) To add a new layer you need to:

2) is it a regular yolo4 with different weights? Can you share the cfg? If the cfg is different from the one in the test folder you need to change the code.

beizhengren commented 4 years ago

@ceccocats

  1. Thanks for your detailed answer. I will try it.
  2. The cfg and weights are downloaded from https://github.com/AlexeyAB/darknet

    If the cfg is different from the one in the test folder you need to change the code.

That means I should put my cfg, which corresponding to my own weights, in thetkDNN-master/tests/yolo4 folder and replace the existed yolov4.cfg?

Here is my yolov4 cfg

please click ```sh [net] batch=64 subdivisions=8 # Training #width=512 #height=512 width=608 height=608 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1 learning_rate=0.00261 burn_in=1000 max_batches = 500500 policy=steps steps=400000,450000 scales=.1,.1 #cutmix=1 mosaic=1 #:104x104 54:52x52 85:26x26 104:13x13 for 416 [convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish # Downsample [convolutional] batch_normalize=1 filters=64 size=3 stride=2 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [route] layers = -2 [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=32 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [route] layers = -1,-7 [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish # Downsample [convolutional] batch_normalize=1 filters=128 size=3 stride=2 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [route] layers = -2 [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=64 size=1 stride=1 pad=1 activation=mish [route] layers = -1,-10 [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish # Downsample [convolutional] batch_normalize=1 filters=256 size=3 stride=2 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [route] layers = -2 [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=128 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=mish [route] layers = -1,-28 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish # Downsample [convolutional] batch_normalize=1 filters=512 size=3 stride=2 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [route] layers = -2 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=256 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=mish [route] layers = -1,-28 [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish # Downsample [convolutional] batch_normalize=1 filters=1024 size=3 stride=2 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [route] layers = -2 [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [convolutional] batch_normalize=1 filters=512 size=3 stride=1 pad=1 activation=mish [shortcut] from=-3 activation=linear [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=mish [route] layers = -1,-16 [convolutional] batch_normalize=1 filters=1024 size=1 stride=1 pad=1 activation=mish ########################## [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky ### SPP ### [maxpool] stride=1 size=5 [route] layers=-2 [maxpool] stride=1 size=9 [route] layers=-4 [maxpool] stride=1 size=13 [route] layers=-1,-3,-5,-6 ### End SPP ### [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [upsample] stride=2 [route] layers = 85 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [route] layers = -1, -3 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [upsample] stride=2 [route] layers = 54 [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [route] layers = -1, -3 [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky [convolutional] batch_normalize=1 filters=128 size=1 stride=1 pad=1 activation=leaky ########################## [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=255 activation=linear [yolo] mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 scale_x_y = 1.2 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 [route] layers = -4 [convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=256 activation=leaky [route] layers = -1, -16 [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky [convolutional] batch_normalize=1 filters=256 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=512 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=255 activation=linear [yolo] mask = 3,4,5 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 scale_x_y = 1.1 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 [route] layers = -4 [convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=512 activation=leaky [route] layers = -1, -37 [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] batch_normalize=1 filters=512 size=1 stride=1 pad=1 activation=leaky [convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1024 activation=leaky [convolutional] size=1 stride=1 pad=1 filters=255 activation=linear [yolo] mask = 6,7,8 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1 scale_x_y = 1.05 iou_thresh=0.213 cls_normalizer=1.0 iou_normalizer=0.07 iou_loss=ciou nms_kind=greedynms beta_nms=0.6 ```
ceccocats commented 4 years ago

Hi, our test input size is 416x416, yours apper to be 608x608. You need to change the input dim of the network in yolo4.cpp. (line 9) Also check issue #10 you probably will get NaNs but is not an issue.

beizhengren commented 4 years ago

Hi, @ceccocats Sorry for I did't reply until now. I am so busy recently.

  1. when I using the command:

    ./darknet export cfg/yolov4.cfg yolov4.weights layers debug

    There ares a lot of files in layers

    c0.bin    c117.bin  c132.bin  c146.bin  c159.bin  c27.bin  c41.bin  c56.bin  c6.bin   c86.bin  g139.bin
    c100.bin  c11.bin   c133.bin  c147.bin  c15.bin   c28.bin  c43.bin  c58.bin  c71.bin  c87.bin  g150.bin
    c102.bin  c120.bin  c134.bin  c148.bin  c160.bin  c29.bin  c44.bin  c59.bin  c72.bin  c89.bin  g161.bin
    c104.bin  c122.bin  c135.bin  c149.bin  c16.bin   c2.bin   c46.bin  c5.bin   c74.bin  c8.bin   input.bin
    c105.bin  c123.bin  c136.bin  c14.bin   c18.bin   c31.bin  c47.bin  c60.bin  c75.bin  c90.bin  output.bin
    c106.bin  c124.bin  c137.bin  c152.bin  c19.bin   c32.bin  c49.bin  c62.bin  c77.bin  c91.bin
    c107.bin  c125.bin  c138.bin  c154.bin  c1.bin    c34.bin  c4.bin   c63.bin  c78.bin  c93.bin
    c10.bin   c126.bin  c141.bin  c155.bin  c21.bin   c35.bin  c50.bin  c65.bin  c80.bin  c94.bin
    c114.bin  c127.bin  c143.bin  c156.bin  c23.bin   c37.bin  c52.bin  c66.bin  c81.bin  c96.bin
    c115.bin  c12.bin   c144.bin  c157.bin  c24.bin   c38.bin  c54.bin  c68.bin  c83.bin  c97.bin
    c116.bin  c130.bin  c145.bin  c158.bin  c25.bin   c40.bin  c55.bin  c69.bin  c85.bin  c99.bin

    and none is in debug.

  2. And how can I make this binary files to .trt file ? my cfg and weights are from https://github.com/AlexeyAB/darknet

Thanks!

ceccocats commented 4 years ago
  1. To get debug you must compile the darknet fork without GPU
  2. Create a new test starting from our yolo4
beizhengren commented 4 years ago

@ceccocats I’ll try it. Thank you!

mive93 commented 4 years ago

Closing for now, feel free to reopen.