PaddlePaddle / Serving

A flexible, high-performance carrier for machine learning models(『飞桨』服务化部署框架)
Apache License 2.0
898 stars 250 forks source link

pipeline部署模型,出现lod报错,_get_bbox_result无法返回bbox_results #1902

Open ClassmateXiaoyu opened 1 year ago

ClassmateXiaoyu commented 1 year ago
环境
CUDA 11.7
cudnn 8.4.1
显卡:GTX 1070 
python 3.8.13
PaddlePaddle 2.4.1.post117
paddle-serving-server-gpu 0.9.0
paddle_serving_app 0.9.0

用paddleX训练的PPYOLOv2模型,通过python -m paddle_serving_client.convert --dirname  --model_filename  --params_filename  --serving_server serving_server --serving_client serving_client命令将inference模型转为了server模型。
发现一个问题,同一个模型用不同的方式部署后,会出现lod报错。具体如下:
1、当我用pipeline方式部署,fetch_dict中没有fetch_name.lod这个键,fetch_dict:  {'save_infer_model/scale_0.tmp_1': array([[  0.        ,   0.85202295, 216.68979   ,  64.207535  ,        436.6143    , 332.37054   ]], dtype=float32)}。
也就是没有lod信息,client与server通讯时,出现报错
Traceback (most recent call last):
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_server/pipeline/error_catch.py", line 97, in wrapper
    res = func(*args, **kw)
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_server/pipeline/operator.py", line 1179, in postprocess_help
    postped_data, prod_errcode, prod_errinfo = self.postprocess(
  File "pipeline_web_service_linux.py", line 72, in postprocess
    self.img_postprocess(
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_app/reader/image_reader.py", line 426, in __call__
    bbox_result = self._get_bbox_result(image_with_bbox, fetch_name,
  File "/root/anaconda3/envs/paddle38/lib/python3.8/site-packages/paddle_serving_app/reader/image_reader.py", line 344, in _get_bbox_result
    lod = [fetch_map[fetch_name + '.lod']]
KeyError: 'save_infer_model/scale_0.tmp_1.lod'
Classname: Op._run_postprocess.<locals>.postprocess_help
FunctionName: postprocess_help

2、当我用非pipeline方式部署时,fetch_map则有fetch_name.lod这个键,fetch_map:{'save_infer_model/scale_0.tmp_1': array([[0.0000000e+00, 6.3646980e-02, 5.2615891e+00, 1.2278875e+02,
        1.6876831e+02, 3.5357916e+02],
       [0.0000000e+00, 4.2369448e-02, 6.6680511e+01, 6.9318405e+01,
        6.0023975e+02, 5.3855756e+02],
       [0.0000000e+00, 1.8086428e-02, 1.2872772e+02, 1.4232706e+02,
        2.9876392e+02, 3.3751181e+02],
       [0.0000000e+00, 1.5854711e-02, 1.8734198e+02, 3.1824486e+01,
        3.5457477e+02, 1.9962274e+02],
       [0.0000000e+00, 1.5454855e-02, 2.1284140e+02, 1.9946268e+02,
        3.9645621e+02, 4.0698849e+02],
       [0.0000000e+00, 1.4058443e-02, 1.5301871e+02, 2.4853967e+02,
        3.2183228e+02, 4.3125073e+02],
       [0.0000000e+00, 1.2545503e-02, 1.1664839e+02, 2.4064153e+02,
        3.0317432e+02, 4.3767188e+02],
       [0.0000000e+00, 1.1161749e-02, 3.8942078e+01, 1.3401808e+02,
        1.8269760e+02, 3.4691406e+02],
       [0.0000000e+00, 1.0988280e-02, 1.4913477e+02, 1.8804048e+02,
        3.2895029e+02, 3.5642706e+02],
       [0.0000000e+00, 1.0884989e-02, 1.5156635e+02, 2.1480481e+02,
        3.2716016e+02, 3.9296497e+02]], dtype=float32), 'save_infer_model/scale_0.tmp_1.lod': array([ 0, 10])}。
client与server通讯时则没有报错,可以正常返回预测结果。
{'result': [{'bbox': [5.261589050292969, 122.78874969482422, 164.50672149658203, 231.79041290283203], 'category_id': 0, 'score': 0.06364697962999344}]}

请问技术同学,为何同一个模型,不同部署方式,会出现lod缺失的问题,这个问题该如何处理呀,谢谢!
ClassmateXiaoyu commented 1 year ago

系统环境centos 7.9

fanruifeng commented 1 year ago

你好 这个问题解决了嘛 目前我也是这个情况

ClassmateXiaoyu commented 1 year ago

你好 这个问题解决了嘛 目前我也是这个情况

暂未解决,我还未想到如何解决,官方技术同学也还未答复这个问题

fanruifeng commented 1 year ago

好的 方便加个QQ嘛 1125729232 交流下; 我目前用 非pipeline方式部署时, 我服务端服务也能正常启动, 但是在客户端处理的时候,接口返回异常 {'err_no': 10000, 'err_msg': 'Log_id: 10000 Raise_msg: transpose_0.tmp_0 ClassName: Op._run_postprocess..postprocess_help FunctionName: postprocess_help', 'key': [], 'value': [], 'tensors': []}

wjplove8 commented 1 year ago

https://github.com/PaddlePaddle/Serving/issues/1635#issuecomment-1439528134

好的 方便加个QQ嘛 1125729232 交流下; 我目前用 非pipeline方式部署时, 我服务端服务也能正常启动, 但是在客户端处理的时候,接口返回异常 {'err_no': 10000, 'err_msg': 'Log_id: 10000 Raise_msg: transpose_0.tmp_0 ClassName: Op._run_postprocess..postprocess_help FunctionName: postprocess_help', 'key': [], 'value': [], 'tensors': []}

你好 这个问题解决了嘛 目前我也是这个情况

HuiHuiSun commented 1 year ago

你好 这个问题解决了嘛 目前我也是这个情况

暂未解决,我还未想到如何解决,官方技术同学也还未答复这个问题

你好,请问这个问题现在解决了吗?