ceccocats / tkDNN

Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
GNU General Public License v2.0
717 stars 209 forks source link

Error reading file yolo4/layers/c138.bin with n of float: 65280 seek: 0 size: 261120 #99

Closed zjZSTU closed 4 years ago

zjZSTU commented 4 years ago

hi tkDNN, i met a question when export darknet

reproduce

  1. following the tutorial, download darknet and make it
git clone https://git.hipert.unimore.it/fgatti/darknet.git
cd darknet
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers
  1. then export weights
$ ./darknet export ~/wk/14_drone/pytorch-YOLOv4/yolov4-obj.cfg ~/wk/14_drone/pytorch-YOLOv4/yolov4-obj_last.weights layers
 GPU isn't used 
 OpenCV isn't used - data augmentation will be slow 
mini_batch = 1, batch = 64, time_steps = 1, train = 1 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
...
...
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 127.310 
avg_outputs = 1047617 
Loading weights from /home/user/wk/14_drone/pytorch-YOLOv4/yolov4-obj_last.weights...
 seen 64, trained: 2035 K-images (31 Kilo-batches_64) 
Done! Loaded 162 layers from weights-file 
n: 0, type 0
Convolutional
weights: 864, biases: 32, batch_normalize: 1, groups: 1
write binary layers/c0.bin

n: 1, type 0
Convolutional
weights: 18432, biases: 64, batch_normalize: 1, groups: 1
write binary layers/c1.bin
...
...
anchor 243.000000
anchor 459.000000
anchor 401.000000
write binary layers/g161.bin

network input size: 1108992
Predicted in 26.121703 seconds.

networks output size: 11913
  1. move the debugs/ and layers/ to tkDNN/build/yolo4

  2. finally, run the test_yolo4 command

$ ./test_yolo4
Not supported field: batch=1
Not supported field: subdivisions=1
Not supported field: momentum=0.949
Not supported field: decay=0.0005
Not supported field: angle=0
Not supported field: saturation = 1.5
Not supported field: exposure = 1.5
Not supported field: hue=.1
Not supported field: learning_rate=0.00261
Not supported field: burn_in=1000
Not supported field: max_batches = 500500
Not supported field: policy=steps
Not supported field: steps=400000,450000
Not supported field: scales=.1,.1
Not supported field: mosaic=1
New NETWORK (tkDNN v0.5, CUDNN v8)
Reading weights: I=3 O=32 KERNEL=3x3x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=1024 O=1024 KERNEL=1x1x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=2048 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=255 KERNEL=1x1x1
Error reading file yolo4/layers/c138.bin with n of float: 65280 seek: 0 size: 261120

/home/user/software/tkDNN/src/utils.cpp:58
Aborting...

what's wrong with it? please help me

piepieninja commented 4 years ago

I'm getting this same issue with yolov4-tiny, same steps only I get Error reading file yolo4/layers/g30.bin with n of float 570 seek: 0 size:20280

ceccocats commented 4 years ago

You are using a different cfg from yolo4? test_yolo4 must load your yolov4-obj.cfg

piepieninja commented 4 years ago

In terms of training my own thing, I actually just read some more issues and this worked: https://github.com/ceccocats/tkDNN/issues/52#issuecomment-662473806 Not sure if that's what OP was asking about

zjZSTU commented 4 years ago

In terms of training my own thing, I actually just read some more issues and this worked: #52 (comment) Not sure if that's what OP was asking about

hi @ceccocats @piepieninja, i solved my problem. thank for yours reply

i trained own dataset for six classes, so there was a problem when using ./test_yolo4 to create .rt file. This should be done:

  1. cd /tests/darknet, copy yolo4.cpp to yolo4_custom.cpp
  2. open yolo4_custom.cpp, modify the following code
    std::string cfg_path  = std::string(TKDNN_PATH) + "/tests/darknet/cfg/yolo4.cfg";
    std::string name_path = std::string(TKDNN_PATH) + "/tests/darknet/names/coco.names";

using own .cfg and .names file

  1. remake the project
$ rm -rf build
$ mkdir build
$ cd build
$ cmake ..
$ make

in build/ , you can get executable file yolo4_custom

  1. mkdir ./build/yolo4/, mv layers/ and debug/ into it, run
./yolo4_custom
...
...
268 Yolo              19 x   19,   33  ->   19 x   19,   33
===========================================================

GPU free memory: 2933.93 mb.
New NetworkRT (TensorRT v7.1)
Float16 support: 1
Int8 support: 1
DLAs: 2
Selected maxBatchSize: 4
GPU free memory: 2547.96 mb.
Building tensorRT cuda engine...
serialize net
create execution context
Input/outputs numbers: 4
input idex = 0 -> output index = 3
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1
RtBuffer 0   dim: Data dim: 1 3 608 608 1
RtBuffer 1   dim: Data dim: 1 33 76 76 1
RtBuffer 2   dim: Data dim: 1 33 38 38 1
RtBuffer 3   dim: Data dim: 1 33 19 19 1

====== CUDNN inference ======
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1

===== TENSORRT inference ====
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1

=== OUTPUT 0 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 1396 ]: 0.458866 0.48307
 | [ 1472 ]: 0.57735 0.603257
 | [ 1620 ]: 0.509987 0.535873
 | [ 3125 ]: 0.527447 0.507159
 | [ 4148 ]: 0.541243 0.519599
 | [ 4305 ]: 0.728546 0.707675
 | [ 4314 ]: 0.406434 0.433637
 | [ 4381 ]: 0.560149 0.534244
 | [ 4547 ]: 0.400655 0.421259
 | Wrongs: 1376 ~0.02
CUDNN vs TRT    
 | [ 1396 ]: 0.483033 0.458866
 | [ 1472 ]: 0.603223 0.57735
 | [ 1620 ]: 0.535915 0.509987
 | [ 3125 ]: 0.507174 0.527447
 | [ 4148 ]: 0.519642 0.541243
 | [ 4305 ]: 0.70762 0.728546
 | [ 4314 ]: 0.433672 0.406434
 | [ 4381 ]: 0.534179 0.560149
 | [ 4547 ]: 0.42129 0.400655
 | Wrongs: 1372 ~0.02

=== OUTPUT 1 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 54 ]: 0.565153 0.537672
 | [ 55 ]: 0.456518 0.431752
 | [ 357 ]: 0.294531 0.320589
 | [ 394 ]: 0.57262 0.595539
 | [ 1537 ]: 0.460857 0.483783
 | [ 1538 ]: 0.53794 0.561384
 | [ 1798 ]: 0.626915 0.647547
 | [ 2576 ]: 0.513811 0.53931
 | [ 2894 ]: 0.501953 0.522789
 | Wrongs: 397 ~0.02
CUDNN vs TRT    
 | [ 54 ]: 0.537726 0.565153
 | [ 55 ]: 0.431785 0.456518
 | [ 357 ]: 0.320566 0.294531
 | [ 394 ]: 0.595514 0.57262
 | [ 1537 ]: 0.483744 0.460857
 | [ 1538 ]: 0.561364 0.53794
 | [ 1798 ]: 0.64758 0.626915
 | [ 2576 ]: 0.539305 0.513811
 | [ 2894 ]: 0.52281 0.501953
 | Wrongs: 397 ~0.02

=== OUTPUT 2 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 744 ]: -0.882812 -0.860113
 | [ 1845 ]: 0.488558 0.468554
 | [ 1888 ]: 0.574395 0.544778
 | [ 2018 ]: 0.621771 0.642833
 | [ 2120 ]: 0.51687 0.537387
 | [ 2121 ]: 0.39946 0.423939
 | [ 2122 ]: 0.353875 0.376532
 | [ 2286 ]: 0.576125 0.602082
 | [ 2917 ]: 0.356783 0.335187
 | Wrongs: 60 ~0.02
CUDNN vs TRT    
 | [ 744 ]: -0.860134 -0.882812
 | [ 1888 ]: 0.544778 0.574395
 | [ 2018 ]: 0.642876 0.621771
 | [ 2120 ]: 0.537414 0.51687
 | [ 2121 ]: 0.423974 0.39946
 | [ 2122 ]: 0.376583 0.353875
 | [ 2286 ]: 0.602043 0.576125
 | [ 2917 ]: 0.335207 0.356783
 | [ 3332 ]: 0.389784 0.366363
 | Wrongs: 59 ~0.02

done

Sudhakar17 commented 3 years ago

@zjZSTU @mive93 I followed the same like your comment but I am getting the following error.

=== OUTPUT 0 CHECK RESULTS == Error opening file yolo3tiny_custom/debug/layer16_out.bin /home/nvidia/Development/tkDNN/src/utils.cpp:45 Aborting...

There is no layer16_out.bin inside the debug folder.

ChanJoon commented 10 months ago

I'm getting this same issue with yolov4-tiny, same steps only I get Error reading file yolo4/layers/g30.bin with n of float 570 seek: 0 size:20280

Hi. I met same error, "Error reading file layers/g30.bin with n of float: 6591 seek: 0 size: 26364" I know too much time has passed. but the problem is different from c~~.bin or input.bin errors I think. So I need your help. How did you solve your problem?