Export yolov3 model with shape of weights error

PaddlePaddle / PaddleDetection

Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.

Apache License 2.0

12.75k stars 2.88k forks source link

Export yolov3 model with shape of weights error #46

Closed lyw615 closed 4 years ago

lyw615 commented 4 years ago

command : python tools/export_model.py -c configs/yolov3_darknet.yml --output_dir=./inference_model -o weights=../model/car_p/vehicle_yolov3_darknet YoloTestFeed.image_shape=[3,608,608]

logs::2019-11-27 15:09:45,628-INFO: Loading parameters from ../model/car_p/vehicle_yolov3_darknet... Traceback (most recent call last): File "tools/export_model.py", line 120, in main() File "tools/export_model.py", line 107, in main checkpoint.load_params(exe, infer_prog, cfg.weights) File "./ppdet/utils/checkpoint.py", line 118, in load_params fluid.io.load_vars(exe, path, prog, predicate=_if_exist) File "/software/conda/envs/super_mask/lib/python3.6/site-packages/paddle/fluid/io.py", line 682, in load_vars filename=filename) File "/software/conda/envs/super_mask/lib/python3.6/site-packages/paddle/fluid/io.py", line 741, in load_vars format(orig_shape, each_var.name, new_shape))

RuntimeError: Shape not matching: the Program requires a parameter with a shape of ((255, 1024, 1, 1)), while the loaded parameter (namely [ yolo_output.0.conv.weights ]) has a shape of ((33, 1024, 1, 1))**

The weights can be used to infer image correctly,but with error in model export. Is the code or config file not updated?

Reference from: https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.1/docs/EXPORT_MODEL.md

qingqing01 commented 4 years ago

@lyw615

The error shape in yolo_output.0.conv.weights caused by the class-num. The config file doesn't match your models. The doc is only an reference example. If you export model for vehicle-detection, you should use contrib/VehicleDetection/vehicle_yolov3_darknet.yml to export model.

qingqing01 commented 4 years ago

@lyw615 Is this issue solved?

lyw615 commented 4 years ago

Yeah, as what you said. After model exported ,prediction can be achieved by the codes you offered. And satisfied my requirements to crop large size array into multi small size array to infer.But what confused me is the accuracy loss. True results can be infered from the large size but the small size array not for car plate.

lyw615 commented 4 years ago

Model exported for image shape (3,608,608), image shape for input is (2560,1920,3) . Blocksize from (1024,1024) to (512,512) are all tried with half blocksize as the offset . But the results seems no differences. Missed bboxes get score from 0 to 0.1. Do you know the reason for this

heavengate commented 4 years ago

Is the missed bboxes is crop into two image by your crop operation?

lyw615 commented 4 years ago

No,just the score is too low,below 0.1

heavengate commented 4 years ago

No,just the score is too low,below 0.1

sorry，I mean is the missed object is cropped by your crop operation and split into multi images, have you tried just crop arround the missed object as input. otherwise, is the image preprocess pipeline same as the large one, In YOLO, input image should be reshape as [608, 608] in your config

lyw615 commented 4 years ago

The object not cropped into two images,but complete in the cropped array. And the model outputs a low score for it.

lyw615 commented 4 years ago

Yeah, the image shape config is (608,608). But input array with shape (2560,1024) gets better preformance than (1024,1024)

lyw615 commented 4 years ago

Have you tried crop a small size image from the large one by manual and compared the performance? I have tried and found the result is same as above

heavengate commented 4 years ago

Yeah, the image shape config is (608,608). But input array with shape (2560,1024) gets better preformance than (1024,1024)

YOLOv3 is trained with random shape 320~608，theoretically, it is better to use input size in range [320, 608] in inference.

heavengate commented 4 years ago

Have you tried crop a small size image from the large one by manual and compared the performance? I have tried and found the result is same as above

YOLOv3 has 3 output layers to detect small, median, large object, same object in different image scale, for example, a 100*200 object in an image 320*320, maybe the best result is in the large object detection layer, and in an image 1024*1024 maybe the best result is in the small detection layers, according to your dataset object size distribution and so on, detection performance in different layer may be different, the result above maybe logical result in my opinion。

lyw615 commented 4 years ago

So i can just changed image_shape when export model to fit the feature of my dataset? I tried export model with image_shape (320,320) rather than (608,608) and got better results with same operation above.

heavengate commented 4 years ago

So i can just changed image_shape when export model to fit the feature of my dataset? I tried export model with image_shape (320,320) rather than (608,608) and got better results with same operation above.

model export is shape invariant, setting image shape in export model has no influence on model, a same model weight can inference any input size logically because we use random shape in training. Default random shape in training is [320, 608], see here https://github.com/PaddlePaddle/PaddleDetection/blob/a66c504ca6ed67478e4f1797b11de012eb82d3a1/docs/config_example/yolov3_darknet.yml#L232 it is recommend to use input size in range [320, 608] in inference

lyw615 commented 4 years ago

Thank you very much

lyw615 commented 4 years ago

Which the most scale in the original input image？

lyw615 commented 4 years ago

Which the most scale in the original input image for car license training？

heavengate commented 4 years ago

Which the most scale in the original input image for car license training？

could you please elaborate what do you mean by "the most scale"?

lyw615 commented 4 years ago

The proportion of image size used for training. For example 30% images with shape 10241024; 50% with shape 2560 3640. So that i can adjust my input image shape for car plate detection.

heavengate commented 4 years ago

The proportion of image size used for training. For example 30% images with shape 10241024; 50% with shape 2560 3640. So that i can adjust my input image shape for car plate detection.

input images' height and width will reshape to 320~608， original image shape is not really relevant to training performance， if you want to impove performance, you could try to generate anchor in your own dataset by clustering gtbox with KMean algorithm