Object Detection toolkit based on PaddlePaddle. It supports object detection, instance segmentation, multiple object tracking and real-time multi-person keypoint detection.
Traceback (most recent call last):
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 177, in
main()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 173, in main
run(FLAGS, cfg)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 130, in run
trainer.predict(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\engine\trainer.py", line 558, in predict
outs = self.model(data)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(*inputs, kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, *kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 75, in forward
outs.append(self.get_pred())
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 128, in get_pred
return self._forward()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 80, in _forward
neck_feats = self.neck(body_feats, self.for_mot)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(inputs, kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\necks\custom_pan.py", line 199, in forward
block = paddle.concat([route, block], axis=1)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\tensor\manipulation.py", line 345, in concat
return paddle.fluid.layers.concat(input=x, axis=axis, name=name)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\tensor.py", line 327, in concat
return _C_ops.concat(input, 'axis', axis)
ValueError: (InvalidArgument) The 3-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 192, 40, 24], input[1]'s shape = [1, 256, 40, 23].
for i, block in enumerate(blocks):
if i > 0:
block = paddle.concat([route, block], axis=1)
route = self.fpn_stages[i](block)
fpn_feats.append(route)
if i < self.num_blocks - 1:
route = self.fpn_routes[i](route)
route = F.interpolate(
route, scale_factor=2., data_format=self.data_format)
Traceback (most recent call last):
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 182, in
main()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 173, in main
run(FLAGS, cfg)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 130, in run
trainer.predict(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\engine\trainer.py", line 558, in predict
outs = self.model(data)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(*inputs, *kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(inputs, **kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 75, in forward
outs.append(self.get_pred())
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 128, in get_pred
return self._forward()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 117, in _forward
bbox, bbox_num = self.yolo_head.post_process(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 366, in post_process
pred_bboxes = batch_distance2bbox(anchor_points,
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\bbox_utils.py", line 767, in batch_distance2bbox
x1y1 = -lt + points
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\math_op_patch.py", line 264, in impl
return math_op(self, other_var, 'axis', axis)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 4760, 2] and the shape of Y = [4700, 2]. Received [4760] in X is not equal to [4700] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/fluid/operators/elementwise/elementwise_op_function.h:240)
[operator < elementwise_add > error]
当输入图像尺寸不是32的整数倍时,会出错。比如我设置输入图像尺寸为640x320,先训练一个yoloe模型,然后推理时出错:
Traceback (most recent call last): File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 177, in
main()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 173, in main
run(FLAGS, cfg)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 130, in run
trainer.predict(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\engine\trainer.py", line 558, in predict
outs = self.model(data)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(*inputs, kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, *kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 75, in forward
outs.append(self.get_pred())
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 128, in get_pred
return self._forward()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 80, in _forward
neck_feats = self.neck(body_feats, self.for_mot)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(inputs, kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(*inputs, **kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\necks\custom_pan.py", line 199, in forward
block = paddle.concat([route, block], axis=1)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\tensor\manipulation.py", line 345, in concat
return paddle.fluid.layers.concat(input=x, axis=axis, name=name)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\layers\tensor.py", line 327, in concat
return _C_ops.concat(input, 'axis', axis)
ValueError: (InvalidArgument) The 3-th dimension of input[0] and input[1] is expected to be equal.But received input[0]'s shape = [1, 192, 40, 24], input[1]'s shape = [1, 256, 40, 23].
[operator < concat > error]
然后我发现这个错误发生在PaddleDetection\ppdet\modeling\necks\custom_pan.py文件的193行"def forward(self, blocks, for_mot=False):"函数中: 原始代码粘贴如下:
错误的原因是route = F.interpolate(route, scale_factor=2., data_format=self.data_format)这句话。在这里blocks是backbone(YOLOv3)输出的一个3元组,尺寸分别为[1,512,20,12],[1,256,40,23],[1,128,80,45]。这里经过2倍上采样将产生和下一个block尺寸不一样的特征图。为此,应该改为:route = F.interpolate(route, size=blocks[i+1].shape[-2:]., data_format=self.data_format)
修改完这一处程序后,运行infer.py加载刚才训练的640*360尺寸的ppyoloe模型进行单帧图像检测时仍然报错:
Traceback (most recent call last): File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 182, in
main()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 173, in main
run(FLAGS, cfg)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\tools\infer.py", line 130, in run
trainer.predict(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\engine\trainer.py", line 558, in predict
outs = self.model(data)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 917, in call
return self._dygraph_call_func(*inputs, *kwargs)
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\layers.py", line 907, in _dygraph_call_func
outputs = self.forward(inputs, **kwargs)
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\meta_arch.py", line 75, in forward
outs.append(self.get_pred())
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 128, in get_pred
return self._forward()
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\architectures\yolo.py", line 117, in _forward
bbox, bbox_num = self.yolo_head.post_process(
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py", line 366, in post_process
pred_bboxes = batch_distance2bbox(anchor_points,
File "C:\Users\87162\Documents\PyCharm\PaddleDetection\ppdet\modeling\bbox_utils.py", line 767, in batch_distance2bbox
x1y1 = -lt + points
File "C:\Users\87162\anaconda3\envs\paddle\lib\site-packages\paddle\fluid\dygraph\math_op_patch.py", line 264, in impl
return math_op(self, other_var, 'axis', axis)
ValueError: (InvalidArgument) Broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 4760, 2] and the shape of Y = [4700, 2]. Received [4760] in X is not equal to [4700] in Y at i:1.
[Hint: Expected x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1 == true, but received x_dims_array[i] == y_dims_array[i] || x_dims_array[i] <= 1 || y_dims_array[i] <= 1:0 != true:1.] (at ..\paddle/fluid/operators/elementwise/elementwise_op_function.h:240)
[operator < elementwise_add > error]
然后我找到了另外一处bug:PaddleDetection\ppdet\modeling\heads\ppyoloe_head.py文件的183行forward_eval函数: 原始代码粘贴如下:
在ppyoloe的config文件里指定了PPYOLOEHead: eval_input_size: [640, 360],则会导致这里将执行anchor_points, stride_tensor = self.anchor_points, self.stride_tensor这句话,而self.anchor_points, self.stride_tensor是模型初始化时根据fpn_strides变量推算出来的,该变量也在config文件中也指定了fpn_strides: [32, 16, 8]。这里的问题是self.anchor_points的长度为80x45+40x22+20x11=4700,其中算法假设了图像尺寸分别为[80,45],[40,22],[20,11],也就是原图除以fpn_strides再向下取整得到的。但如果根据feats的实际尺寸来算则长度应该为80x45+40x23+20x12=4760.
因此,我将配置文件(.yml)中eval_input_size这一行注释掉,然后推理过程就可以正常执行了。
但是通过读PaddleDetection源代码,我发现这么做会影响推理速度,因为模型要反复执行anchor_points, stride_tensor = self._generate_anchors(feats)这句话进行重复计算。为此我将该函数改为:
另外我发现整个的推理过程之所以出现上述2处潜在bug的原因在于PaddleDetection\ppdet\modeling\backbones\cspresnet.py文件中的CSPResNet -> CSPResStage网络使用的下采样方法[line.172]:
该下采样过程将输入尺寸w=360/8=45变为输出尺寸(w-3+2)/2+1=23,而正常的2x2下采样我们理解应该会直接对图像尺寸除以二再向下取整,即输出尺寸应为int(45/2)=22,这二者的差异是推理过程报错的根本原因。
我现在想问的问题是, 1)我在python文件中进行了上述修改,使得推理过程可以正常执行。但是我最终是要把模型保存为静态模型,然后通过paddle_inference推理引擎,用C++去调用模型的。我的上述努力能对C++推理起作用吗? 2)postprocess中关于self._generate_anchors函数被频繁调用造成计算效率下降的问题,虽然我在python程序中进行了修改,但是C++部署时就可以不用管了吗? 3)另外,我想知道为什么模型训练过程不会报错?似乎只有推理过程在报错。
附我的模型训练配置文件:
log_iter: 100 snapshot_epoch: 3
pretrain_weights: ppyoloe_crn_s_300e_coco.pdparams
depth_mult: 0.33 width_mult: 0.50
use_gpu: true save_dir: output
epoch: 100
LearningRate: base_lr: 0.002 schedulers:
OptimizerBuilder: optimizer: type: Momentum regularizer: factor: 0.0001 type: L2
worker_num: 4
TrainReader: sample_transforms:
EvalReader: sample_transforms:
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
TestReader: inputs_def:
image_shape: [3, 640, 640]
sample_transforms:
- Resize: {target_size: [640, 640], keep_ratio: False, interp: 2}
architecture: YOLOv3 norm_type: sync_bn use_ema: true ema_decay: 0.9998
YOLOv3: backbone: CSPResNet neck: CustomCSPPAN yolo_head: PPYOLOEHead post_process: ~
CSPResNet: layers: [3, 6, 6, 3] channels: [64, 128, 256, 512, 1024] return_idx: [1, 2, 3] use_large_stem: True
CustomCSPPAN: out_channels: [768, 384, 192] stage_num: 1 block_num: 3 act: 'swish' spp: true
PPYOLOEHead: fpn_strides: [32, 16, 8] grid_cell_scale: 5.0 grid_cell_offset: 0.5 static_assigner_epoch: 100 use_varifocal_loss: True
eval_input_size: [640, 640]
eval_input_size: [640, 360] loss_weight: {class: 1.0, iou: 2.5, dfl: 0.5} static_assigner: name: ATSSAssigner topk: 9 assigner: name: TaskAlignedAssigner topk: 13 alpha: 1.0 beta: 6.0 nms: name: MultiClassNMS nms_top_k: 1000 keep_top_k: 100 score_threshold: 0.01 nms_threshold: 0.6
metric: COCO num_classes: 1
TrainDataset: !COCODataSet image_dir: D:\data\INRIAPerson anno_path: coco_pos.json dataset_dir: D:\data\INRIAPerson\Train data_fields: ['image', 'gt_bbox', 'gt_class', 'is_crowd']
EvalDataset: !COCODataSet image_dir: D:\data\INRIAPerson anno_path: coco_pos.json dataset_dir: D:\data\INRIAPerson\Test
TestDataset: !ImageFolder anno_path: coco_pos.json