[Other General Issues]PaddleDetection使用TensorRT选项，对进行yolov3_mobilenet_v1_qat推理时报错

Disciple7 commented 2 years ago

PaddleDetection team appreciate any suggestion or problem you delivered~

Checklist:

查找历史相关issue寻求解答/I have searched related issues but cannot get the expected help.
翻阅FAQ /I have read the FAQ documentation but cannot get the expected help.

描述问题/Describe the bug

按照官方教程，在VOC数据集上训练了一个yolov3_mobilenet_v1_270e_voc模型，量化config使用默认的yolov3_mobilenet_v1_qat.yml。导出为paddleserving模型后，不开启--use_trt选项时可以正常运行和检测，但无加速和压缩效果。开启--use_trt时报此错误。

在PaddleServing群里咨询管理后，管理称这是模型问题，建议联系PaddleDetection。

另外，用benchmark脚本测试时也同样报TensorRT相关错误，无法测试trt_fp32、trt_fp16和trt_int8，但use_cpu、use_gpu可以测试。

复现/Reproduction

您使用的命令是？/What command or script did you run? paddleserving命令：

cd /home/ubuntu/lxd-storage/xzy/PaddleCV/PaddleDetection/inference_model/yolov3_mobilenet_v1_270e_qat_pdserving/yolov3_mobilenet_v1_qat
python -m paddle_serving_server.serve --model serving_server --port 9393 --gpu_ids 0 --precision int8 --use_trt

benchmark测试命令：

bash deploy/benchmark/benchmark.sh ./inference_model/yolov3_mobilenet_v1_270e_voc_origin model
bash deploy/benchmark/benchmark_quant.sh ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat model

您是否更改过代码或配置文件？您是否理解您所更改的内容？还请您提供所更改的部分代码。/Did you make any modifications on the code or config? Did you understand what you have modified? Please provide the codes that you modified.

否

您使用的数据集是？/What dataset did you use?

VOC数据集

请提供您出现的报错信息及相关log。/Please provide the error messages or relevant log information. paddleserving报错信息和log：


/home/ubuntu/miniconda3/envs/paddle_env/lib/python3.7/runpy.py:125: RuntimeWarning: 'paddle_serving_server.serve' found in sys.modules after import of package 'paddle_serving_server', but prior to execution of 'paddle_serving_server.serve'; this may result in unpredictable behaviour
warn(RuntimeWarning(msg))
Going to Run Comand
/home/ubuntu/miniconda3/envs/paddle_env/lib/python3.7/site-packages/paddle_serving_server/serving-gpu-101-0.8.3/serving -enable_model_toolkit -inferservice_path workdir_9393 -inferservice_file infer_service.prototxt -max_concurrency 0 -num_threads 4 -port 9393 -precision int8 -use_calib=False -reload_interval_s 10 -resource_path workdir_9393 -resource_file resource.prototxt -workflow_path workdir_9393 -workflow_file workflow.prototxt -bthread_concurrency 4 -max_body_size 536870912
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralDistKVInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralDistKVQuantInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralInferOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralReaderOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralRecOp
I0100 00:00:00.000000 11926 op_repository.h:68] RAW: Succ regist op: GeneralResponseOp
I0100 00:00:00.000000 11926 service_manager.h:79] RAW: Service[LoadGeneralModelService] insert successfully!
I0100 00:00:00.000000 11926 load_general_model_service.pb.h:333] RAW: Success regist service[LoadGeneralModelService][PN5baidu14paddle_serving9predictor26load_general_model_service27LoadGeneralModelServiceImplE]
I0100 00:00:00.000000 11926 service_manager.h:79] RAW: Service[GeneralModelService] insert successfully!
I0100 00:00:00.000000 11926 general_model_service.pb.h:1608] RAW: Success regist service[GeneralModelService][PN5baidu14paddle_serving9predictor13general_model23GeneralModelServiceImplE]
I0100 00:00:00.000000 11926 factory.h:155] RAW: Succ insert one factory, tag: PADDLE_INFER, base type N5baidu14paddle_serving9predictor11InferEngineE
W0100 00:00:00.000000 11926 paddle_engine.cpp:34] RAW: Succ regist factory: ::baidu::paddle_serving::predictor::FluidInferEngine<PaddleInferenceEngine>->::baidu::paddle_serving::predictor::InferEngine, tag: PADDLE_INFER in macro!
I0415 14:52:29.661229 11929 analysis_predictor.cc:576] TensorRT subgraph engine is enabled
--- Running analysis [ir_graph_build_pass]
--- Running analysis [ir_graph_clean_pass]
--- Running analysis [ir_analysis_pass]
--- Running IR pass [conv_affine_channel_fuse_pass]
--- Running IR pass [adaptive_pool2d_convert_global_pass]
--- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass]
--- Running IR pass [shuffle_channel_detect_pass]
--- Running IR pass [quant_conv2d_dequant_fuse_pass]
--- Running IR pass [delete_quant_dequant_op_pass]
I0415 14:52:29.881975 11929 fuse_pass_base.cc:57] ---  detected 47 subgraphs
--- Running IR pass [delete_quant_dequant_filter_op_pass]
I0415 14:52:29.944126 11929 fuse_pass_base.cc:57] ---  detected 47 subgraphs
--- Running IR pass [simplify_with_basic_ops_pass]
--- Running IR pass [embedding_eltwise_layernorm_fuse_pass]
--- Running IR pass [multihead_matmul_fuse_pass_v2]
--- Running IR pass [multihead_matmul_fuse_pass_v3]
--- Running IR pass [skip_layernorm_fuse_pass]
--- Running IR pass [unsqueeze2_eltwise_fuse_pass]
--- Running IR pass [squeeze2_matmul_fuse_pass]
--- Running IR pass [reshape2_matmul_fuse_pass]
--- Running IR pass [flatten2_matmul_fuse_pass]
--- Running IR pass [map_matmul_v2_to_mul_pass]
--- Running IR pass [map_matmul_v2_to_matmul_pass]
--- Running IR pass [map_matmul_to_mul_pass]
--- Running IR pass [fc_fuse_pass]
--- Running IR pass [conv_elementwise_add_fuse_pass]
--- Running IR pass [tensorrt_subgraph_pass]
I0415 14:52:30.001792 11929 tensorrt_subgraph_pass.cc:138] ---  detect a sub-graph with 145 nodes
I0415 14:52:30.034184 11929 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time.
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
what():

C++ Traceback (most recent call last):

Error Message Summary:

UnimplementedError: no OpConverter for optype [nearest_interp_v2] [Hint: it should not be null.] (at /paddle/paddle/fluid/inference/tensorrt/convert/op_converter.h:142)

Aborted (core dumped)


Benchmark测试报错：

model_dir : ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat img_dir: demo/fire_smoke_demo model ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat, run_mode: trt_int8 ----------- Running Arguments ----------- batch_size: 1 camera_id: -1 cpu_threads: 1 device: GPU enable_mkldnn: False image_dir: demo/fire_smoke_demo image_file: None model_dir: ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat output_dir: output reid_batch_size: 50 reid_model_dir: None run_benchmark: True run_mode: trt_int8 save_images: False save_mot_txt_per_img: False save_mot_txts: False scaled: False threshold: 0.5 trt_calib_mode: False trt_max_shape: 1280 trt_min_shape: 1 trt_opt_shape: 640 use_dark: True use_gpu: False video_file: None

----------- Model Configuration ----------- Model Arch: YOLO Transform Order: --transform op: Resize --transform op: NormalizeImage --transform op: Permute

Traceback (most recent call last): File "deploy/python/infer.py", line 773, in main() File "deploy/python/infer.py", line 726, in main enable_mkldnn=FLAGS.enable_mkldnn) File "deploy/python/infer.py", line 94, in init enable_mkldnn=enable_mkldnn) File "deploy/python/infer.py", line 563, in load_predictor predictor = create_predictor(config) ValueError: (InvalidArgument) Pass tensorrt_subgraph_pass has not been registered. Please use the paddle inference library compiled with tensorrt or disable the tensorrt engine in inference configuration! [Hint: Expected Has(pass_type) == true, but received Has(pass_type):0 != true:1.] (at /paddle/paddle/fluid/framework/ir/pass.h:236)



## 环境/Environment
1. 请提供您使用的Paddle和PaddleDetection的版本号/Please provide the version of Paddle and PaddleDetection you use：
paddlepaddle-gpu=2.2.2.post101
paddledet=2.3.0

2. 如您在使用PaddleDetection的同时还在使用其他产品，如PaddleServing、PaddleInference等，请您提供其版本号/ Please provide the version of any other related tools/products used, such as the version of PaddleServing and etc：
paddleslim=2.2.2
paddle-serving-server-gpu=0.8.3.post101
tensorrt=6.0.1.5

3. 请提供您使用的操作系统信息，如Linux/Windows/MacOS /Please provide the OS information, e.g., Linux：
Ubuntu 16.04

4. 请问您使用的Python版本是？/ Please provide the version of Python you used.
Python 3.7

7. 请问您使用的CUDA/cuDNN的版本号是？/ Please provide the version of CUDA/cuDNN you used.
CUDA 10.1

yghstill commented 2 years ago

@Disciple7

确认下paddle serving的版本，看报错是nearest_interp_v2这个op不支持，可以咨询下paddle serving~
测试benchmark时，看报错是TRT_MIN_SUBGRAPH 不够，可以调整TRT_MIN_SUBGRAPH为40，然后重新导出模型试下：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/engine/export_utils.py#L31

Disciple7 commented 2 years ago

@Disciple7

确认下paddle serving的版本，看报错是nearest_interp_v2这个op不支持，可以咨询下paddle serving~

测试benchmark时，看报错是TRT_MIN_SUBGRAPH 不够，可以调整TRT_MIN_SUBGRAPH为40，然后重新导出模型试下：https://github.com/PaddlePaddle/PaddleDetection/blob/release/2.4/ppdet/engine/export_utils.py#L31

paddleserving已经是最新版本了。 benchmark那里改了，也没有用，还是报同样的错误。

Disciple7 commented 2 years ago

补充一下，我的paddle是直接安装的预编译库https://paddle-inference-lib.bj.bcebos.com/2.2.2/python/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda10.1_cudnn7.6.5_trt6.0.1.5/paddlepaddle_gpu-2.2.2.post101-cp37-cp37m-linux_x86_64.whl 和这个有关系吗？

yghstill commented 2 years ago

@Disciple7 我们复现了这个问题，正在解决中。

yghstill commented 2 years ago

@Disciple7 经排查，需要使用Paddle inference 2.3rc或者develop版本，2.3rc的包即将发布，请先手动编译下使用吧

Disciple7 commented 2 years ago

@Disciple7 经排查，需要使用Paddle inference 2.3rc或者develop版本，2.3rc的包即将发布，请先手动编译下使用吧

我装了paddlepaddle-gpu 2.3.0rc0的包，paddleserving还是报一样的no OpConverter错误，直接推理报错变成了CUDA runtime和CUDA Driver不匹配：

(paddle_env) ubuntu@public:~/lxd-storage/xzy/PaddleCV/PaddleDetection$ python deploy/python/infer.py   --model_dir=./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat   --image_file=./dataset/fire_smoke_voc/images/00001.jpg --device=GPU --run_mode=trt_int8
W0428 13:08:26.092806 18784 init.cc:179] Compiled with WITH_GPU, but no GPU found in runtime.
/home/ubuntu/miniconda3/envs/paddle_env/lib/python3.7/site-packages/paddle/fluid/framework.py:478: UserWarning: You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default.
  "You are using GPU version Paddle, but your CUDA device is not set properly. CPU device will be used by default."
-----------  Running Arguments -----------
batch_size: 1
camera_id: -1
cpu_threads: 1
device: GPU
enable_mkldnn: False
image_dir: None
image_file: ./dataset/fire_smoke_voc/images/00001.jpg
model_dir: ./inference_model/yolov3_mobilenet_v1_270e_qat/yolov3_mobilenet_v1_qat
output_dir: output
reid_batch_size: 50
reid_model_dir: None
run_benchmark: False
run_mode: trt_int8
save_images: False
save_mot_txt_per_img: False
save_mot_txts: False
scaled: False
threshold: 0.5
trt_calib_mode: False
trt_max_shape: 1280
trt_min_shape: 1
trt_opt_shape: 640
use_dark: True
use_gpu: False
video_file: None
------------------------------------------
-----------  Model Configuration -----------
Model Arch: YOLO
Transform Order:
--transform op: Resize
--transform op: NormalizeImage
--transform op: Permute
--------------------------------------------
Traceback (most recent call last):
  File "deploy/python/infer.py", line 773, in <module>
    main()
  File "deploy/python/infer.py", line 726, in main
    enable_mkldnn=FLAGS.enable_mkldnn)
  File "deploy/python/infer.py", line 94, in __init__
    enable_mkldnn=enable_mkldnn)
  File "deploy/python/infer.py", line 563, in load_predictor
    predictor = create_predictor(config)
OSError: (External) CUDA error(35), CUDA driver version is insufficient for CUDA runtime version.
  [Hint: 'cudaErrorInsufficientDriver'. This indicates that the installed NVIDIA CUDA driver is older than the CUDA runtime library. This is not a supported configuration.Users should install an updated NVIDIA display driver to allow the application to run.] (at /paddle/paddle/phi/backends/gpu/cuda/cuda_info.cc:66)

但是我看nvidia-smi上面显示应该是匹配的，根据这个issue，CUDA 10.1对应的driver>=418.39即可：

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.64       Driver Version: 430.64       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:04:00.0 Off |                  N/A |
| 51%   57C    P2   206W / 250W |   6423MiB / 11019MiB |     49%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  Off  | 00000000:83:00.0 Off |                  N/A |
| 27%   29C    P8    15W / 250W |     10MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

PaddlePaddle / PaddleDetection