david8862 / keras-YOLOv3-model-set

end-to-end YOLOv4/v3/v2 object detection pipeline, implemented on tf.keras with different technologies
MIT License
639 stars 222 forks source link

TypeError: buffer is too small for requested array #42

Closed farhodbekshamsiyev closed 4 years ago

farhodbekshamsiyev commented 4 years ago

Hi there I need some help. Before explaining problem I share some information about my environment. OS: Linux(manjaro) CUDA: 10.2 Python: 3.7 Tensorflow: 2.1.0 Keras: 2.3.1 RAM: 16gb GPU: 1060gtx 6gb CPU: i7 8750H I am going to build 4 class custom object detector with yolo4, but stuck in this stage of converting weights to .h5. I am trying to convert yolo4.weights but I got some error. I couldn't fix it by myself so I need some help. I typed this:

python tools/convert.py --yolo4_reorder cfg/yolov4.cfg weights/yolov4.conv.137.weights weights/yolov4.h5

and got this error: 2020-05-15 01:05:09.672004: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory 2020-05-15 01:05:09.672058: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory 2020-05-15 01:05:09.672066: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. Loading weights. Weights Header: 0 2 5 [0] Parsing Darknet config. Creating Keras model. Parsing section net_0 Parsing section convolutional_0 conv2d bn mish (3, 3, 3, 32) 2020-05-15 01:05:10.169154: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 .......................................................................................... Concatenating route layers: [<tf.Tensor 'leaky_re_lu_14/Identity:0' shape=(None, None, None, 128) dtype=float32>, <tf.Tensor 'up_sampling2d_1/Identity:0' shape=(None, None, None, 128) dtype=float32>] Parsing section convolutional_87 conv2d bn leaky (1, 1, 256, 128) Parsing section convolutional_88 conv2d bn leaky (3, 3, 128, 256) Parsing section convolutional_89 conv2d bn leaky (1, 1, 256, 128) Parsing section convolutional_90 conv2d bn leaky (3, 3, 128, 256) Parsing section convolutional_91 conv2d bn leaky (1, 1, 256, 128) Parsing section convolutional_92 conv2d bn leaky (3, 3, 128, 256) Traceback (most recent call last): File "tools/convert.py", line 305, in _main(parser.parse_args()) File "tools/convert.py", line 133, in _main buffer=weights_file.read(filters * 4)) TypeError: buffer is too small for requested array

I do not know what to do! If someone has got converted yolo4.h5 file please share.

david8862 commented 4 years ago

Hi @farhodbekshamsiyev , you should use "yolov4.weights" for model converting, "yolov4.conv.137.weights" does not match "yolov4.cfg" which is config of whole YOLOv4 model

farhodbekshamsiyev commented 4 years ago

Hi @farhodbekshamsiyev , you should use "yolov4.weights" for model converting, "yolov4.conv.137.weights" does not match "yolov4.cfg" which is config of whole YOLOv4 model

Hi everything works now. I got h5 file. but some error occured. When I type this: python train.py --model_type=yolo4_mobilenet --weights_path=weights/yolov4.h5 --annotation_file=voc_train.txt --classes_path=configs/voc_classes.txt --anchors_path=configs/yolo4_anchors.txt --save_eval_checkpoint --eval_online

I got this result: ValueError: Layer #119 (named "batch_normalization_35"), weight <tf.Variable
'batch_normalization_35/gamma:0' shape=(512,) dtype=float32, numpy= array( [1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., ................................................................................. 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)> has shape (512,), but the saved weight has shape (128,).

And also can you suggest the most suitable training command for 4class object detection from voc dataset(car, bus, bicycle, person). How can I use pre-trained weights and got the most appropriate solution for 4class object detection? Can I use this: python train.py --model_type=yolo4_mobilenet --weights_path=weights/yolov4.h5 --annotation_file=voc_train.txt --classes_path=configs/voc_classes.txt --anchors_path=configs/yolo4_anchors.txt --save_eval_checkpoint --batch_size=16 --eval_online --eval_epoch_interval=3 --transfer_epoch=5 --freeze_level=1 --total_epoch=20

david8862 commented 4 years ago

For yolo4_mobilenet model type training, you don't need to load the official YOLOv4 model weights by "--weights_path=weights/yolov4.h5". The model construction part will download MobileNet Imagenet pretrained weights automatically as backbone for transfer training base. And you need to create a new class definition file of the 4 classes (car, bus, bicycle, person) for dataset convert & train. Regarding training config cmd, maybe you can try following one which is just what I always used:

# python train.py --model_type=yolo4_mobilenet --anchors_path=configs/yolo4_anchors.txt --annotation_file=voc_train.txt --classes_path=configs/my_classes.txt --model_image_size=416x416 --multiscale --rescale_interval=50 --learning_rate=0.001 --transfer_epoch=10 --init_epoch=0 --total_epoch=100 --eval_online --eval_epoch_interval=5 --save_eval_checkpoint
farhodbekshamsiyev commented 4 years ago

For yolo4_mobilenet model type training, you don't need to load the official YOLOv4 model weights by "--weights_path=weights/yolov4.h5". The model construction part will download MobileNet Imagenet pretrained weights automatically as backbone for transfer training base. And you need to create a new class definition file of the 4 classes (car, bus, bicycle, person) for dataset convert & train. Regarding training config cmd, maybe you can try following one which is just what I always used:

# python train.py --model_type=yolo4_mobilenet --anchors_path=configs/yolo4_anchors.txt --annotation_file=voc_train.txt --classes_path=configs/my_classes.txt --model_image_size=416x416 --multiscale --rescale_interval=50 --learning_rate=0.001 --transfer_epoch=10 --init_epoch=0 --total_epoch=100 --eval_online --eval_epoch_interval=5 --save_eval_checkpoint

Hi everything works fine but I have on more question! I use cmd which you gave me and got this result: 1/10 worked fine. 11/100 : 248/441 [===============>..............] - ETA: 18:02 - loss: 26.6065 - location 249/441 [===============>..............] - ETA: 17:56 - loss: 26.6288 - location_loss: 5.6755 - confidence_loss: 12.4886 - class_loss: 2.3435Killed

How can I go through all epoch steps without any killings? To find the golden point Have I use different training args for cmds randomly or there is some steps which I have pay attention. And also I edited yoloy4.cfg and changed some parts: filters like (classes+5)3, max_batches = 8000(classes2000) policy=steps steps=6400,7200(change line steps to 80% and 90% of max_batches) classes=4(change line classes=80 to your number of objects in each of 3 [yolo]-layers). That's all.

Before this I trained yolov3 and it also killed in 40/41 epoch but saved model works fine! Thanks for your reply!!!

david8862 commented 4 years ago

@farhodbekshamsiyev , not quite sure about your training env, but it seems like to be caused by sys mem or GPU mem exhausted. maybe you can remove the "--multiscale --rescale_interval=50" options to disable the multiscale training, which will make the memory usage to be stable.

farhodbekshamsiyev commented 4 years ago

@farhodbekshamsiyev , not quite sure about your training env, but it seems like to be caused by sys mem or GPU mem exhausted. maybe you can remove the "--multiscale --rescale_interval=50" options to disable the multiscale training, which will make the memory usage to be stable.

Thank you very much I am training in Lenovo Legion Y7000P laptop with 16gb ram and 6gb gpu memory

gan3sh500 commented 4 years ago

@david8862 Imagenet weights are missing for cspdarknet53. I was able to make them using your convert.py using the cfg and weights files mentioned here https://github.com/AlexeyAB/darknet/wiki/Train-and-Evaluate-Detector-on-Pascal-VOC-(VOCtrainval-2007-2012)-dataset please update your link to cspdarknet53*.h5 in yolov4_darknet.py

david8862 commented 4 years ago

@gan3sh500 many thanks for the info! I've updated code & doc to support the CSPDarknet53 backbone weights

gan3sh500 commented 4 years ago

Could you add config for diou or ciou loss for yolov4 which works. I tried adding warmup and changed to SGD but I keep having nan.

david8862 commented 4 years ago

Could you add config for diou or ciou loss for yolov4 which works. I tried adding warmup and changed to SGD but I keep having nan.

YOLOv4 just share the loss function implementation of YOLOv3. So you can enable diou loss from here

gan3sh500 commented 4 years ago

I was trying out diou by changing that kwarg but kept getting nan no matter what learning rate I used. For now I am just using yolov3 loss but CIOU is supposed to give big jump in mAP according to the paper.

david8862 commented 4 years ago

I was trying out diou by changing that kwarg but kept getting nan no matter what learning rate I used. For now I am just using yolov3 loss but CIOU is supposed to give big jump in mAP according to the paper.

have you ever tried a longer warmup stage to make the convergence more stable?

gan3sh500 commented 4 years ago

It was going nan for me within 10 iterations even when warmup of 5000 steps to 1e-4 lr. This was on VOC only dogs data. I will try again and inform later. For now I have to train some models soon and so using yolov3 loss.