Error reading file yolo4/layers/c138.bin with n of float: 65280 seek: 0 size: 261120

zjZSTU commented 4 years ago

hi tkDNN, i met a question when export darknet

reproduce

following the tutorial, download darknet and make it

git clone https://git.hipert.unimore.it/fgatti/darknet.git
cd darknet
make
mkdir layers debug
./darknet export <path-to-cfg-file> <path-to-weights> layers

then export weights

$ ./darknet export ~/wk/14_drone/pytorch-YOLOv4/yolov4-obj.cfg ~/wk/14_drone/pytorch-YOLOv4/yolov4-obj_last.weights layers
 GPU isn't used 
 OpenCV isn't used - data augmentation will be slow 
mini_batch = 1, batch = 64, time_steps = 1, train = 1 
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
...
...
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 127.310 
avg_outputs = 1047617 
Loading weights from /home/user/wk/14_drone/pytorch-YOLOv4/yolov4-obj_last.weights...
 seen 64, trained: 2035 K-images (31 Kilo-batches_64) 
Done! Loaded 162 layers from weights-file 
n: 0, type 0
Convolutional
weights: 864, biases: 32, batch_normalize: 1, groups: 1
write binary layers/c0.bin

n: 1, type 0
Convolutional
weights: 18432, biases: 64, batch_normalize: 1, groups: 1
write binary layers/c1.bin
...
...
anchor 243.000000
anchor 459.000000
anchor 401.000000
write binary layers/g161.bin

network input size: 1108992
Predicted in 26.121703 seconds.

networks output size: 11913

move the debugs/ and layers/ to tkDNN/build/yolo4
finally, run the test_yolo4 command

$ ./test_yolo4
Not supported field: batch=1
Not supported field: subdivisions=1
Not supported field: momentum=0.949
Not supported field: decay=0.0005
Not supported field: angle=0
Not supported field: saturation = 1.5
Not supported field: exposure = 1.5
Not supported field: hue=.1
Not supported field: learning_rate=0.00261
Not supported field: burn_in=1000
Not supported field: max_batches = 500500
Not supported field: policy=steps
Not supported field: steps=400000,450000
Not supported field: scales=.1,.1
Not supported field: mosaic=1
New NETWORK (tkDNN v0.5, CUDNN v8)
Reading weights: I=3 O=32 KERNEL=3x3x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=32 KERNEL=1x1x1
Reading weights: I=32 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=128 KERNEL=3x3x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=128 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=64 O=64 KERNEL=3x3x1
Reading weights: I=64 O=64 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=128 O=128 KERNEL=3x3x1
Reading weights: I=128 O=128 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=256 O=256 KERNEL=3x3x1
Reading weights: I=256 O=256 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=512 O=512 KERNEL=3x3x1
Reading weights: I=512 O=512 KERNEL=1x1x1
Reading weights: I=1024 O=1024 KERNEL=1x1x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=2048 O=512 KERNEL=1x1x1
Reading weights: I=512 O=1024 KERNEL=3x3x1
Reading weights: I=1024 O=512 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=512 KERNEL=3x3x1
Reading weights: I=512 O=256 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=128 KERNEL=1x1x1
Reading weights: I=128 O=256 KERNEL=3x3x1
Reading weights: I=256 O=255 KERNEL=1x1x1
Error reading file yolo4/layers/c138.bin with n of float: 65280 seek: 0 size: 261120

/home/user/software/tkDNN/src/utils.cpp:58
Aborting...

what's wrong with it? please help me

piepieninja commented 4 years ago

I'm getting this same issue with yolov4-tiny, same steps only I get Error reading file yolo4/layers/g30.bin with n of float 570 seek: 0 size:20280

ceccocats commented 4 years ago

You are using a different cfg from yolo4? test_yolo4 must load your yolov4-obj.cfg

piepieninja commented 4 years ago

In terms of training my own thing, I actually just read some more issues and this worked: https://github.com/ceccocats/tkDNN/issues/52#issuecomment-662473806 Not sure if that's what OP was asking about

zjZSTU commented 4 years ago

In terms of training my own thing, I actually just read some more issues and this worked: #52 (comment) Not sure if that's what OP was asking about

hi @ceccocats @piepieninja, i solved my problem. thank for yours reply

i trained own dataset for six classes, so there was a problem when using ./test_yolo4 to create .rt file. This should be done:

cd /tests/darknet, copy yolo4.cpp to yolo4_custom.cpp
open yolo4_custom.cpp, modify the following code

    std::string cfg_path  = std::string(TKDNN_PATH) + "/tests/darknet/cfg/yolo4.cfg";
    std::string name_path = std::string(TKDNN_PATH) + "/tests/darknet/names/coco.names";

using own .cfg and .names file

remake the project

$ rm -rf build
$ mkdir build
$ cd build
$ cmake ..
$ make

in build/ , you can get executable file yolo4_custom

mkdir ./build/yolo4/, mv layers/ and debug/ into it, run

./yolo4_custom
...
...
268 Yolo              19 x   19,   33  ->   19 x   19,   33
===========================================================

GPU free memory: 2933.93 mb.
New NetworkRT (TensorRT v7.1)
Float16 support: 1
Int8 support: 1
DLAs: 2
Selected maxBatchSize: 4
GPU free memory: 2547.96 mb.
Building tensorRT cuda engine...
serialize net
create execution context
Input/outputs numbers: 4
input idex = 0 -> output index = 3
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1
RtBuffer 0   dim: Data dim: 1 3 608 608 1
RtBuffer 1   dim: Data dim: 1 33 76 76 1
RtBuffer 2   dim: Data dim: 1 33 38 38 1
RtBuffer 3   dim: Data dim: 1 33 19 19 1

====== CUDNN inference ======
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1

===== TENSORRT inference ====
Data dim: 1 3 608 608 1
Data dim: 1 33 19 19 1

=== OUTPUT 0 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 1396 ]: 0.458866 0.48307
 | [ 1472 ]: 0.57735 0.603257
 | [ 1620 ]: 0.509987 0.535873
 | [ 3125 ]: 0.527447 0.507159
 | [ 4148 ]: 0.541243 0.519599
 | [ 4305 ]: 0.728546 0.707675
 | [ 4314 ]: 0.406434 0.433637
 | [ 4381 ]: 0.560149 0.534244
 | [ 4547 ]: 0.400655 0.421259
 | Wrongs: 1376 ~0.02
CUDNN vs TRT    
 | [ 1396 ]: 0.483033 0.458866
 | [ 1472 ]: 0.603223 0.57735
 | [ 1620 ]: 0.535915 0.509987
 | [ 3125 ]: 0.507174 0.527447
 | [ 4148 ]: 0.519642 0.541243
 | [ 4305 ]: 0.70762 0.728546
 | [ 4314 ]: 0.433672 0.406434
 | [ 4381 ]: 0.534179 0.560149
 | [ 4547 ]: 0.42129 0.400655
 | Wrongs: 1372 ~0.02

=== OUTPUT 1 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 54 ]: 0.565153 0.537672
 | [ 55 ]: 0.456518 0.431752
 | [ 357 ]: 0.294531 0.320589
 | [ 394 ]: 0.57262 0.595539
 | [ 1537 ]: 0.460857 0.483783
 | [ 1538 ]: 0.53794 0.561384
 | [ 1798 ]: 0.626915 0.647547
 | [ 2576 ]: 0.513811 0.53931
 | [ 2894 ]: 0.501953 0.522789
 | Wrongs: 397 ~0.02
CUDNN vs TRT    
 | [ 54 ]: 0.537726 0.565153
 | [ 55 ]: 0.431785 0.456518
 | [ 357 ]: 0.320566 0.294531
 | [ 394 ]: 0.595514 0.57262
 | [ 1537 ]: 0.483744 0.460857
 | [ 1538 ]: 0.561364 0.53794
 | [ 1798 ]: 0.64758 0.626915
 | [ 2576 ]: 0.539305 0.513811
 | [ 2894 ]: 0.52281 0.501953
 | Wrongs: 397 ~0.02

=== OUTPUT 2 CHECK RESULTS ==
CUDNN vs correct | OK ~0.02
TRT   vs correct
 | [ 744 ]: -0.882812 -0.860113
 | [ 1845 ]: 0.488558 0.468554
 | [ 1888 ]: 0.574395 0.544778
 | [ 2018 ]: 0.621771 0.642833
 | [ 2120 ]: 0.51687 0.537387
 | [ 2121 ]: 0.39946 0.423939
 | [ 2122 ]: 0.353875 0.376532
 | [ 2286 ]: 0.576125 0.602082
 | [ 2917 ]: 0.356783 0.335187
 | Wrongs: 60 ~0.02
CUDNN vs TRT    
 | [ 744 ]: -0.860134 -0.882812
 | [ 1888 ]: 0.544778 0.574395
 | [ 2018 ]: 0.642876 0.621771
 | [ 2120 ]: 0.537414 0.51687
 | [ 2121 ]: 0.423974 0.39946
 | [ 2122 ]: 0.376583 0.353875
 | [ 2286 ]: 0.602043 0.576125
 | [ 2917 ]: 0.335207 0.356783
 | [ 3332 ]: 0.389784 0.366363
 | Wrongs: 59 ~0.02

done

Sudhakar17 commented 3 years ago

@zjZSTU @mive93 I followed the same like your comment but I am getting the following error.

=== OUTPUT 0 CHECK RESULTS == Error opening file yolo3tiny_custom/debug/layer16_out.bin /home/nvidia/Development/tkDNN/src/utils.cpp:45 Aborting...

There is no layer16_out.bin inside the debug folder.

ChanJoon commented 10 months ago

I'm getting this same issue with yolov4-tiny, same steps only I get Error reading file yolo4/layers/g30.bin with n of float 570 seek: 0 size:20280

Hi. I met same error, "Error reading file layers/g30.bin with n of float: 6591 seek: 0 size: 26364" I know too much time has passed. but the problem is different from c~~.bin or input.bin errors I think. So I need your help. How did you solve your problem?

ceccocats / tkDNN

Error reading file yolo4/layers/c138.bin with n of float: 65280 seek: 0 size: 261120 #99

reproduce