qpfhuan commented 4 years ago

使用cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms 预训练模型训练Object365检测任务。模型可以用tools/infer训练分支进行前向，但导出模型后使用cpp_infer报错。

cpp_demo.py设置为 use_python_inference: true # whether to use python inference mode: fluid # trt_fp32, trt_fp16, trt_int8, fluid arch: RCNN # YOLO, SSD, RCNN, RetinaNet min_subgraph_size: 40 # need 3 for YOLO arch

visualize the predicted image

metric: COCO # COCO, VOC draw_threshold: 0.5

Preprocess:

type: Resize target_size: 640 max_size: 640
type: Normalize mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225 is_scale: True
type: Permute to_bgr: False
type: PadStride stride: 0 # set 32 on FPN and 128 on RetinaNet

以下为设置use_python_inference: true运行报错 2020-04-21 23:20:07,444-INFO: The architecture is RCNN 2020-04-21 23:20:07,445-INFO: Extra info: im_info, im_shape W0421 23:20:09.060719 23906 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 9.0 W0421 23:20:09.069774 23906 device_context.cc:245] device: 0, cuDNN Version: 7.0. W0421 23:20:09.069803 23906 device_context.cc:271] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.3, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. 2020-04-21 23:20:11,176-INFO: warmup... /usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py:789: UserWarning: The following exception is not an EOF exception. "The following exception is not an EOF exception.") Traceback (most recent call last): File "tools/cpp_infer.py", line 319, in infer() File "tools/cpp_infer.py", line 250, in infer return_numpy=False) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 790, in run six.reraise(*sys.exc_info()) File "/usr/local/lib/python3.6/dist-packages/six.py", line 693, in reraise raise value File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 785, in run use_program_cache=use_program_cache) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 838, in _run_impl use_program_cache=use_program_cache) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 912, in _run_program fetch_var_name) paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const, int) 2 paddle::operators::PyFuncOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 3 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 4 paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext, paddle::framework::Scope, bool, bool, bool) 5 paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator > const&, bool, bool)

Python Call Stacks (More useful to users):

File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layers/nn.py", line 12638, in py_func 'backward_skip_vars': list(backward_skip_vars) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/ops.py", line 373, in call func=_soft_nms, x=[bboxes, scores], out=pred_result) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/roi_heads/cascade_head.py", line 250, in get_prediction_cls_aware pred_result = self.nms(bboxes=box_out, scores=sum_cascade_cls_prob) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 170, in build self.cascade_decoded_box, self.cascade_bbox_reg_weights) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 217, in test return self.build(feed_vars, 'test') File "tools/export_model.py", line 97, in main test_fetches = model.test(feed_vars) File "tools/export_model.py", line 114, in main()

Error Message Summary:

Error: Invalid python callable id [Hint: Expected i < g_py_callables.size(), but received i:0 >= g_py_callables.size():0.] at (/paddle/paddle/fluid/operators/py_func_op.cc:45) [operator < py_func > error]

以下为 use_python_inference: false # whether to use python inference mode: fluid # trt_fp32, trt_fp16, trt_int8, fluid arch: RCNN # YOLO, SSD, RCNN, RetinaNet min_subgraph_size: 40 # need 3 for YOLO arch

visualize the predicted image

metric: COCO # COCO, VOC draw_threshold: 0.5

Preprocess:

type: Resize target_size: 640 max_size: 640
type: Normalize mean:
- 0.485
- 0.456
- 0.406 std:
- 0.229
- 0.224
- 0.225 is_scale: True
type: Permute to_bgr: False
type: PadStride stride: 0 # set 32 on FPN and 128 on RetinaNet

报错

2020-04-21 23:28:53,083-INFO: The architecture is RCNN 2020-04-21 23:28:53,083-INFO: Extra info: im_info, im_shape 2020-04-21 23:28:53,088-INFO: min_subgraph_size = 40. 2020-04-21 23:28:53,088-INFO: Run inference by Fluid FP32. I0421 23:28:54.013773 25361 analysis_predictor.cc:84] Profiler is deactivated, and no profiling report will be generated. I0421 23:28:54.087523 25361 analysis_predictor.cc:833] MODEL VERSION: 1.7.2 I0421 23:28:54.087569 25361 analysis_predictor.cc:835] PREDICTOR VERSION: 1.7.2 --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [is_test_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [conv_affine_channel_fuse_pass] --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0421 23:28:57.255019 25361 graph_pattern_detector.cc:101] --- detected 142 subgraphs --- Running IR pass [conv_eltwiseadd_bn_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass] --- Running IR pass [fc_fuse_pass] I0421 23:29:21.477888 25361 graph_pattern_detector.cc:101] --- detected 6 subgraphs I0421 23:29:21.493244 25361 graph_pattern_detector.cc:101] --- detected 6 subgraphs --- Running IR pass [fc_elementwise_layernorm_fuse_pass] --- Running IR pass [conv_elementwise_add_act_fuse_pass] I0421 23:29:22.303406 25361 graph_pattern_detector.cc:101] --- detected 77 subgraphs --- Running IR pass [conv_elementwise_add2_act_fuse_pass] I0421 23:29:22.691764 25361 graph_pattern_detector.cc:101] --- detected 66 subgraphs --- Running IR pass [conv_elementwise_add_fuse_pass] I0421 23:29:22.786260 25361 graph_pattern_detector.cc:101] --- detected 85 subgraphs --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running IR pass [runtime_context_cache_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0421 23:29:22.860546 25361 ir_params_sync_among_devices_pass.cc:41] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [ir_graph_to_program_pass] I0421 23:29:25.252122 25361 analysis_predictor.cc:462] ======= optimize end ======= 2020-04-21 23:29:25,259-INFO: warmup... W0421 23:29:25.829217 25361 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 9.0 W0421 23:29:25.838049 25361 device_context.cc:245] device: 0, cuDNN Version: 7.0. W0421 23:29:25.838078 25361 device_context.cc:271] WARNING: device: 0. The installed Paddle is compiled with CUDNN 7.3, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. W0421 23:29:27.375310 25361 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set. W0421 23:29:27.375352 25361 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead. W0421 23:29:27.375357 25361 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference Traceback (most recent call last): File "tools/cpp_infer.py", line 319, in infer() File "tools/cpp_infer.py", line 252, in infer outs = predict.run(inputs) paddle.fluid.core_avx.EnforceNotMet:

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackString<char const>(char const&&, char const, int) 1 paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) 2 cudnnActivationStruct paddle::platform::ScopedActivationDescriptor::descriptor(std::string const&, double) 3 paddle::operators::CUDNNConvFusionOpKernel::Compute(paddle::framework::ExecutionContext const&) const 4 std::_Function_handler<void (paddle::framework::ExecutionContext const&), paddle::framework::OpKernelRegistrarFunctor<paddle::platform::CUDAPlace, false, 0ul, paddle::operators::CUDNNConvFusionOpKernel, paddle::operators::CUDNNConvFusionOpKernel >::operator()(char const, char const, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&) 5 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext) const 6 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 7 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 8 paddle::framework::NaiveExecutor::Run() 9 paddle::AnalysisPredictor::Run(std::vector<paddle::PaddleTensor, std::allocator > const&, std::vector<paddle::PaddleTensor, std::allocator >*, int)

Python Call Stacks (More useful to users):

File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/framework.py", line 2525, in append_op attrs=kwargs.get("attrs", None)) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layer_helper.py", line 43, in append_op return self.main_program.current_block().append_op(*args, **kwargs) File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/layers/nn.py", line 1403, in conv2d "data_format": data_format, File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/backbones/resnet.py", line 162, in _conv_norm name=_name + '.conv2d.output.1') File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/backbones/resnet.py", line 260, in _shortcut return self._conv_norm(input, ch_out, 1, stride, name=name) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/backbones/resnet.py", line 319, in bottleneck name=shortcut_name) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/backbones/resnet.py", line 398, in layer_warp gcb_name=gcb_name) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/backbones/resnet.py", line 455, in call res = self.layer_warp(res, i) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 91, in build body_feats = self.backbone(im) File "/home/qipengfei7/qpf/baidu_paddle/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 217, in test return self.build(feed_vars, 'test') File "tools/export_model.py", line 97, in main test_fetches = model.test(feed_vars) File "tools/export_model.py", line 114, in main()

Error Message Summary:

Error: An error occurred here. There is no accurate error hint for this error yet. We are continuously in the process of increasing hint for this kind of error check. It would be helpful if you could inform us of how this conversion went by opening a github issue. And we will resolve it with high priority.

New issue link: https://github.com/PaddlePaddle/Paddle/issues/new
Recommended issue content: all error stack information [Hint: CUDNN_STATUS_BAD_PARAM] at (/paddle/paddle/fluid/platform/cudnn_helper.h:463) [operator < conv2d_fusion > error]

MyPandaShaoxiang commented 4 years ago

The installed Paddle is compiled with CUDNN 7.3, but CUDNN version in your machine is 7.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDNN version. 你好，看起来应该是cudnn版本不对导致的，你可换cudnn7.3以上试一下

qpfhuan commented 4 years ago

我换成了7.6还是报一样的错

qpfhuan commented 4 years ago

另外想问下，在eval的时候，是不是不能直接测试object365的测试集。我用coco的格式验证，结果指标差的比较多。 CUDA_VISIBLE_DEVICES=3 python -u tools/eval.py -c configs/qpf_configs/object365_cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml

MyPandaShaoxiang commented 4 years ago

是用的paddle/model下面的模型么

qpfhuan commented 4 years ago

https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/docs/featured_model/OIDV5_BASELINE_MODEL.md

下载的这个

willthefrog commented 4 years ago

W0421 23:29:27.375310 25361 naive_executor.cc:45] The NaiveExecutor can not work properly if the cmake flag ON_INFER is not set.
W0421 23:29:27.375352 25361 naive_executor.cc:47] Unlike the training phase, all the scopes and variables will be reused to save the allocation overhead.
W0421 23:29:27.375357 25361 naive_executor.cc:50] Please re-compile the inference library by setting the cmake flag ON_INFER=ON if you are running Paddle Inference

paddle版本不对，编译时未开启infer支持。

qpfhuan commented 4 years ago

当前版本是1.7.2， cuda9 python3 ubuntu系统请问更新成什么版本合适，或者如何开启infer支持？

willthefrog commented 4 years ago

cmake那一步加上-DON_INFER=ON 试试？

qpfhuan commented 4 years ago

还没到cmake那步，就只是cpp_infer，没有到部署阶段

willthefrog commented 4 years ago

我说是编译paddle时的cmake

qpfhuan commented 4 years ago

使用源码编译吗，我之前是使用pip安装，有在pip命令中修改的参数吗？还是只能源码编译或者docker编译

willthefrog commented 4 years ago

哦，貌似不是这个问题，你得模型里有py_func阿，这个inference是不支持的。

willthefrog commented 4 years ago

softnms目前是py_func实现，可以先换成普通nms跑inference.

qpfhuan commented 4 years ago

嗯，好的，我还是加载obj365的训练模型，只要改下inference的nms方式再export model就可以用cpp_infer推理是吧。

willthefrog commented 4 years ago

应该是这样，你试试吧

qingqing01 commented 4 years ago

@qpfhuan 如果有问题可以再新开issue，暂时关闭此issue了。

PaddlePaddle / PaddleDetection

模型导出后cpp_infer失败。 cpp_infer failed after exported model #540

visualize the predicted image

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary:

visualize the predicted image

C++ Call Stacks (More useful to developers):

Python Call Stacks (More useful to users):

Error Message Summary: