Closed ash12358 closed 4 years ago
Error: Operator py_func has not been registered [Hint: op_info_ptr should not be null.] at (/paddle/paddle/fluid/framework/op_info.h:140)
@ash12358 看样子导出的模型包含了py_func
,你把softnms
换成普通的nms看下。
@qingqing01 非常感谢,将MultiClassSoftNMS修改为普通的nms后可以在serving上成功运行起来。但是参考着
https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/faster_rcnn_model
进行客户端调试时,出现了一下错误
/opt/Anaconda3/envs/paddle/lib/python3.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.2.1/serving -enable_model_toolkit -inferservice_path workdir -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 10 -port 9292 -reload_interval_s 10 -resource_path workdir -resource_file resource.prototxt -workflow_path workdir -workflow_file workflow.prototxt -bthread_concurrency 10 -max_body_size 536870912
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralTextResponseOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralTextReaderOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralInferOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralDistKVQuantInferOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralDistKVInferOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralReaderOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralCopyOp
I0100 00:00:00.000000 14035 op_repository.h:65] RAW: Succ regist op: GeneralResponseOp
I0100 00:00:00.000000 14035 service_manager.h:61] RAW: Service[LoadGeneralModelService] insert successfully!
I0100 00:00:00.000000 14035 load_general_model_service.pb.h:299] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE]
I0100 00:00:00.000000 14035 service_manager.h:61] RAW: Service[GeneralModelService] insert successfully!
I0100 00:00:00.000000 14035 general_model_service.pb.h:1473] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE]
I0100 00:00:00.000000 14035 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 14035 fluid_cpu_engine.cpp:25] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine
File "/opt/Anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/opt/Anaconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/math_op_patch.py", line 243, in impl
attrs={'axis': axis})
File "/data/ssh/PaddleDetection/ppdet/modeling/backbones/fpn.py", line 93, in _add_topdown_lateral
return lateral + topdown
File "/data/ssh/PaddleDetection/ppdet/modeling/backbones/fpn.py", line 144, in get_output
top_output)
File "/data/ssh/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 95, in build
body_feats, spatial_scale = self.fpn.get_output(body_feats)
File "/data/ssh/PaddleDetection/ppdet/modeling/architectures/cascade_rcnn_cls_aware.py", line 217, in test
return self.build(feed_vars, 'test')
File "tools/export_serving_model.py", line 198, in main
test_fetches = model.test(feed_vars)
File "tools/export_serving_model.py", line 217, in
Error: ShapeError: broadcast dimension mismatch. Operands could not be broadcast together with the shape of X = [1, 256, 27, 40] and the shape of Y = [1, 256, 28, 40]. Received [27] in X is not equal to [28] in Y at (/paddle/paddle/fluid/operators/elementwise/elementwise_op_function.h:145) [operator < elementwise_add > error]
@ash12358 两阶段FPN模型主要导出模型时,输入图片的尺寸需要是32的整数倍。另外,要确认下serving预处理有没有对输入的Pading操作。 @wangjiawei04 可以帮确认下吗?
我是使用cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml这个配置文件训练的object365数据集,然后用 python tools/export_serving_model.py -c configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml --output_dir=serving -o weights=output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/1300000 这个导出模型。 然后用 python -m paddle_serving_server.serve --model serving/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/serving_server --thread 10 --port 9292 来启动,就报了以下错误。但如果使用官方提供的权重和相应的配置文件来导出和运行,就没有问题,这可能是哪里的问题呢?
/opt/Anaconda3/envs/paddle/lib/python3.7/site-packages/paddle_serving_server/serving-cpu-avx-openblas-0.2.1/serving -enable_model_toolkit -inferservice_path workdir -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 10 -port 9292 -reload_interval_s 10 -resource_path workdir -resource_file resource.prototxt -workflow_path workdir -workflow_file workflow.prototxt -bthread_concurrency 10 -max_body_size 536870912 I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralTextResponseOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralTextReaderOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralInferOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralDistKVQuantInferOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralDistKVInferOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralReaderOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralCopyOp I0100 00:00:00.000000 33607 op_repository.h:65] RAW: Succ regist op: GeneralResponseOp I0100 00:00:00.000000 33607 service_manager.h:61] RAW: Service[LoadGeneralModelService] insert successfully! I0100 00:00:00.000000 33607 load_general_model_service.pb.h:299] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE] I0100 00:00:00.000000 33607 service_manager.h:61] RAW: Service[GeneralModelService] insert successfully! I0100 00:00:00.000000 33607 general_model_service.pb.h:1473] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE] I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:25] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS in macro! I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:31] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR in macro! I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:37] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuAnalysisDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_ANALYSIS_DIR_SIGMOID in macro! I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:42] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE in macro! I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:47] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR in macro! I0100 00:00:00.000000 33607 factory.h:121] RAW: Succ insert one factory, tag: FLUID_CPU_NATIVE_DIR_SIGMOID, base type N5baidu14paddle_serving9predictor11InferEngineE W0100 00:00:00.000000 33607 fluid_cpu_engine.cpp:53] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine< FluidCpuNativeDirWithSigmoidCore>->::baidu::paddle_serving::predictor::InferEngine, tag: FLUID_CPU_NATIVE_DIR_SIGMOID in macro! --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [attention_lstm_fuse_pass] --- Running IR pass [seqconv_eltadd_relu_fuse_pass] --- Running IR pass [seqpool_cvm_concat_fuse_pass] --- Running IR pass [fc_lstm_fuse_pass] --- Running IR pass [mul_lstm_fuse_pass] --- Running IR pass [fc_gru_fuse_pass] --- Running IR pass [mul_gru_fuse_pass] --- Running IR pass [seq_concat_fc_fuse_pass] --- Running IR pass [fc_fuse_pass] --- Running IR pass [repeated_fc_relu_fuse_pass] --- Running IR pass [squared_mat_sub_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [conv_eltwiseadd_bn_fuse_pass] --- Running IR pass [conv_transpose_bn_fuse_pass] --- Running IR pass [conv_transpose_eltwiseadd_bn_fuse_pass] --- Running IR pass [is_test_pass] --- Running IR pass [runtime_context_cache_pass] --- Running analysis [ir_params_sync_among_devices_pass] --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [ir_graph_to_program_pass] terminate called after throwing an instance of 'paddle::platform::EnforceNotMet' what():
C++ Call Stacks (More useful to developers):
Error Message Summary:
Error: Operator py_func has not been registered [Hint: op_info_ptr should not be null.] at (/paddle/paddle/fluid/framework/op_info.h:140)
方便给下client的代码吗
https://github.com/PaddlePaddle/Serving/tree/develop/python/examples/cascade_rcnn 可以参考下这个
wget --no-check-certificate https://paddle-serving.bj.bcebos.com/pddet_demo/cascade_rcnn_r50_fpx_1x_serving.tar.gz
tar xf cascade_rcnn_r50_fpx_1x_serving.tar.gz
python -m paddle_serving_server_gpu.serve --model serving_server --port 9292 --gpu_id 0
#另一个终端开启client
python test_client.py
@wangjiawei04
client代码:
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from paddle_serving_client import Client
from paddle_serving_app.reader import Sequential
from paddle_serving_app.reader import *
import sys
import numpy as np
preprocess = Sequential([
File2Image(), BGR2RGB(), Div(255.0),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
Resize(640, 640), Transpose((2, 0, 1))
])
postprocess = RCNNPostprocess("label_list.txt", "output")
client = Client()
client.load_client_config(
"cascade_serving/serving_client/serving_client_conf.prototxt")
client.connect(['127.0.0.1:9292'])
im = preprocess(sys.argv[3])
fetch_map = client.predict(
feed={
"image": im,
"im_info": np.array(list(im.shape[1:]) + [1.0]),
"im_shape": np.array(list(im.shape[1:]) + [1.0])
},
fetch=["multiclass_nms_0.tmp_0"])
#fetch_map["image"] = sys.argv[3]
#postprocess(fetch_map)
print(fetch_map)
执行命令: python tools/new_test_client.py cascade_serving/serving_server/serving_client_conf.prototxt cascade_serving/infer_cfg.yml obj365_val_000000505576.jpg
@wangjiawei04
client代码:
# Copyright (c) 2020 PaddlePaddle Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from paddle_serving_client import Client from paddle_serving_app.reader import Sequential from paddle_serving_app.reader import * import sys import numpy as np preprocess = Sequential([ File2Image(), BGR2RGB(), Div(255.0), Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False), Resize(640, 640), Transpose((2, 0, 1)) ]) postprocess = RCNNPostprocess("label_list.txt", "output") client = Client() client.load_client_config( "cascade_serving/serving_client/serving_client_conf.prototxt") client.connect(['127.0.0.1:9292']) im = preprocess(sys.argv[3]) fetch_map = client.predict( feed={ "image": im, "im_info": np.array(list(im.shape[1:]) + [1.0]), "im_shape": np.array(list(im.shape[1:]) + [1.0]) }, fetch=["multiclass_nms_0.tmp_0"]) #fetch_map["image"] = sys.argv[3] #postprocess(fetch_map) print(fetch_map)
执行命令: python tools/new_test_client.py cascade_serving/serving_server/serving_client_conf.prototxt cascade_serving/infer_cfg.yml obj365_val_000000505576.jpg
preprocess = Sequential([
File2Image(), BGR2RGB(), Div(255.0),
Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225], False),
Resize(640, 640), Transpose((2, 0, 1)), PadStride(32)
])
改成这样
不过你是pip安装的app包还是用Serving编译出来的App包?
这样改之后,在调试时服务端不再报错,但客户端返回的fetch_map是None。
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0526 14:23:49.728278 12659 config_manager.cpp:217] Not found key in configue: cluster
E0526 14:23:49.728319 12659 config_manager.cpp:234] Not found key in configue: split_tag_name
E0526 14:23:49.728327 12659 config_manager.cpp:235] Not found key in configue: tag_candidates
E0526 14:23:49.728343 12659 config_manager.cpp:202] Not found key in configue: connect_timeout_ms
E0526 14:23:49.728350 12659 config_manager.cpp:203] Not found key in configue: rpc_timeout_ms
E0526 14:23:49.728355 12659 config_manager.cpp:205] Not found key in configue: hedge_request_timeout_ms
E0526 14:23:49.728361 12659 config_manager.cpp:207] Not found key in configue: connect_retry_count
E0526 14:23:49.728368 12659 config_manager.cpp:209] Not found key in configue: hedge_fetch_retry_count
E0526 14:23:49.728374 12659 config_manager.cpp:211] Not found key in configue: max_connection_per_host
E0526 14:23:49.728379 12659 config_manager.cpp:212] Not found key in configue: connection_type
E0526 14:23:49.728385 12659 config_manager.cpp:219] Not found key in configue: load_balance_strategy
E0526 14:23:49.728391 12659 config_manager.cpp:221] Not found key in configue: cluster_filter_strategy
E0526 14:23:49.728397 12659 config_manager.cpp:226] Not found key in configue: protocol
E0526 14:23:49.728404 12659 config_manager.cpp:227] Not found key in configue: compress_type
E0526 14:23:49.728410 12659 config_manager.cpp:228] Not found key in configue: package_size
E0526 14:23:49.728415 12659 config_manager.cpp:230] Not found key in configue: max_channel_per_request
E0526 14:23:49.728421 12659 config_manager.cpp:234] Not found key in configue: split_tag_name
E0526 14:23:49.728427 12659 config_manager.cpp:235] Not found key in configue: tag_candidates
I0526 14:23:49.752717 12659 naming_service_thread.cpp:209] brpc::policy::ListNamingService("127.0.0.1:9292"): added 1
W0526 14:24:09.859925 12659 predictor.hpp:129] inference call failed, message: [E1008]Reached timeout=20000ms @0.0.0.0:0
E0526 14:24:10.665114 12659 general_model.cpp:245] failed call predictor with req: insts { tensor_array { float_data: -0.45680279 float_data: -0.45166537 float_data: -0.4302634 float_data: -0.3980673 float_data: -0.37289152 float_data: -0.35405427 float_data: -0.338642 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.27870536 float_data: -0.2684305 float_data: -0.24959326 float_data: -0.23418099 float_data: -0.2256186 float_data: -0.2050689 (后面省略一大堆)
我是用pip安装的paddle-serving-app
这样改之后,在调试时服务端不再报错,但客户端返回的fetch_map是None。
WARNING: Logging before InitGoogleLogging() is written to STDERR E0526 14:23:49.728278 12659 config_manager.cpp:217] Not found key in configue: cluster E0526 14:23:49.728319 12659 config_manager.cpp:234] Not found key in configue: split_tag_name E0526 14:23:49.728327 12659 config_manager.cpp:235] Not found key in configue: tag_candidates E0526 14:23:49.728343 12659 config_manager.cpp:202] Not found key in configue: connect_timeout_ms E0526 14:23:49.728350 12659 config_manager.cpp:203] Not found key in configue: rpc_timeout_ms E0526 14:23:49.728355 12659 config_manager.cpp:205] Not found key in configue: hedge_request_timeout_ms E0526 14:23:49.728361 12659 config_manager.cpp:207] Not found key in configue: connect_retry_count E0526 14:23:49.728368 12659 config_manager.cpp:209] Not found key in configue: hedge_fetch_retry_count E0526 14:23:49.728374 12659 config_manager.cpp:211] Not found key in configue: max_connection_per_host E0526 14:23:49.728379 12659 config_manager.cpp:212] Not found key in configue: connection_type E0526 14:23:49.728385 12659 config_manager.cpp:219] Not found key in configue: load_balance_strategy E0526 14:23:49.728391 12659 config_manager.cpp:221] Not found key in configue: cluster_filter_strategy E0526 14:23:49.728397 12659 config_manager.cpp:226] Not found key in configue: protocol E0526 14:23:49.728404 12659 config_manager.cpp:227] Not found key in configue: compress_type E0526 14:23:49.728410 12659 config_manager.cpp:228] Not found key in configue: package_size E0526 14:23:49.728415 12659 config_manager.cpp:230] Not found key in configue: max_channel_per_request E0526 14:23:49.728421 12659 config_manager.cpp:234] Not found key in configue: split_tag_name E0526 14:23:49.728427 12659 config_manager.cpp:235] Not found key in configue: tag_candidates I0526 14:23:49.752717 12659 naming_service_thread.cpp:209] brpc::policy::ListNamingService("127.0.0.1:9292"): added 1 W0526 14:24:09.859925 12659 predictor.hpp:129] inference call failed, message: [E1008]Reached timeout=20000ms @0.0.0.0:0 E0526 14:24:10.665114 12659 general_model.cpp:245] failed call predictor with req: insts { tensor_array { float_data: -0.45680279 float_data: -0.45166537 float_data: -0.4302634 float_data: -0.3980673 float_data: -0.37289152 float_data: -0.35405427 float_data: -0.338642 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.30268 float_data: -0.27870536 float_data: -0.2684305 float_data: -0.24959326 float_data: -0.23418099 float_data: -0.2256186 float_data: -0.2050689 (后面省略一大堆)
我是用pip安装的paddle-serving-app
这个是超时,需要用gpu的机器来跑。你有gpu的机器吗? 如果没有,需要到你的python lib目录 以python2.7为例 应该在你的 $PYTHONROOT/lib/python2.7/site-packages/paddle_serving_client/init.py的第89行,把超时限制的数字改大。
https://github.com/PaddlePaddle/Serving/blob/develop/python/paddle_serving_client/__init__.py#L89
@ash12358 最新的代码中提供了serving部署相关。有问题可以开新issue,这个issue就暂且关闭了。
我是使用cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml这个配置文件训练的object365数据集,然后用 python tools/export_serving_model.py -c configs/obj365/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms.yml --output_dir=serving -o weights=output/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/1300000 这个导出模型。 然后用 python -m paddle_serving_server.serve --model serving/cascade_rcnn_cls_aware_r200_vd_fpn_dcnv2_nonlocal_softnms/serving_server --thread 10 --port 9292 来启动,就报了以下错误。但如果使用官方提供的权重和相应的配置文件来导出和运行,就没有问题,这可能是哪里的问题呢?