PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.05k stars 5.54k forks source link

Paddle_TRT 不支持 dynamic input shapes #42864

Closed jiguanglu closed 2 years ago

jiguanglu commented 2 years ago

bug描述 Describe the Bug

TensorRT版本: TensorRT-8.4.0.6 python版本: py3.7 PaddlePaddle-gpu: 2.2.2 系统: ubuntu 18.04

import numpy as np import paddle.inference as paddle_infer

def create_predictor(): config = paddle_infer.Config("./resnet50/model", "./resnet50/params") config.enable_memory_optim() config.enable_use_gpu(1000, 0)

# 打开TensorRT。此接口的详细介绍请见下文
config.enable_tensorrt_engine(workspace_size = 1 << 30,
                              max_batch_size = 1,
                              min_subgraph_size = 3,
                              precision_mode=paddle_infer.PrecisionType.Float32,
                              use_static = False, use_calib_mode = False)

predictor = paddle_infer.create_predictor(config)
return predictor

def run(predictor, img):

准备输入

input_names = predictor.get_input_names()
for i,  name in enumerate(input_names):
    input_tensor = predictor.get_input_handle(name)
    input_tensor.reshape(img[i].shape)
    input_tensor.copy_from_cpu(img[i].copy())
# 预测
predictor.run()
results = []
# 获取输出
output_names = predictor.get_output_names()
for i, name in enumerate(output_names):
    output_tensor = predictor.get_output_handle(name)
    output_data = output_tensor.copy_to_cpu()
    results.append(output_data)
return results

if name == 'main': pred = create_predictor() img = np.ones((1, 3, 320, 320)).astype(np.float32) result = run(pred, [img]) print ("class index: ", np.argmax(result[0][0]))

下面是输出信息 、、、 grep: warning: GREP_OPTIONS is deprecated; please use an alias or script W0519 06:51:26.408658 31697 analysis_predictor.cc:1086] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect. I0519 06:51:26.430719 31697 analysis_predictor.cc:854] TensorRT subgraph engine is enabled --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [adaptive_pool2d_convert_global_pass] I0519 06:51:26.521083 31697 fuse_pass_base.cc:57] --- detected 1 subgraphs --- Running IR pass [shuffle_channel_detect_pass] --- Running IR pass [quant_conv2d_dequant_fuse_pass] --- Running IR pass [delete_quant_dequant_op_pass] --- Running IR pass [delete_quant_dequant_filter_op_pass] --- Running IR pass [delete_weight_dequant_linear_op_pass] --- Running IR pass [delete_quant_dequant_linear_op_pass] --- Running IR pass [add_support_int8_pass] I0519 06:51:26.581389 31697 fuse_pass_base.cc:57] --- detected 179 subgraphs --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [preln_embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [skip_layernorm_fuse_pass] --- Running IR pass [preln_skip_layernorm_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0519 06:51:26.654426 31697 fuse_pass_base.cc:57] --- detected 53 subgraphs --- Running IR pass [unsqueeze2_eltwise_fuse_pass] --- Running IR pass [trt_squeeze2_matmul_fuse_pass] --- Running IR pass [trt_reshape2_matmul_fuse_pass] I0519 06:51:26.657245 31697 fuse_pass_base.cc:57] --- detected 1 subgraphs --- Running IR pass [trt_flatten2_matmul_fuse_pass] --- Running IR pass [trt_map_matmul_v2_to_mul_pass] --- Running IR pass [trt_map_matmul_v2_to_matmul_pass] --- Running IR pass [trt_map_matmul_to_mul_pass] --- Running IR pass [fc_fuse_pass] I0519 06:51:26.662055 31697 fuse_pass_base.cc:57] --- detected 1 subgraphs --- Running IR pass [conv_elementwise_add_fuse_pass] I0519 06:51:26.679409 31697 fuse_pass_base.cc:57] --- detected 53 subgraphs --- Running IR pass [tensorrt_subgraph_pass] I0519 06:51:26.685261 31697 tensorrt_subgraph_pass.cc:141] --- detect a sub-graph with 123 nodes I0519 06:51:26.698570 31697 tensorrt_subgraph_pass.cc:403] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. W0519 06:51:29.198441 31697 helper.h:107] TensorRT was linked against cuBLAS/cuBLAS LT 11.8.0 but loaded cuBLAS/cuBLAS LT 11.4.1 W0519 06:51:29.810552 31697 helper.h:107] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.1.1 W0519 06:51:40.424471 31697 helper.h:107] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.1.1 W0519 06:51:40.518280 31697 helper.h:107] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.1.1 I0519 06:51:40.519057 31697 engine.cc:415] ====== engine info ====== W0519 06:51:40.520541 31697 helper.h:107] TensorRT was linked against cuDNN 8.3.2 but loaded cuDNN 8.1.1 I0519 06:51:40.521028 31697 engine.cc:420] Layers: conv2d (Output: batch_norm_0.tmp_215) + relu (Output: batch_norm_0.tmp_317) pool2d (Output: pool2d_0.tmp_019) conv2d (Output: batch_norm_1.tmp_232) + relu (Output: batch_norm_1.tmp_334) conv2d (Output: batch_norm_2.tmp_247) + relu (Output: batch_norm_2.tmp_349) conv2d (Output: batch_norm_3.tmp_262) conv2d (Output: batch_norm_4.tmp_275) + elementwise (Output: elementwise_add_077) + relu (Output: relu_0.tmp_079) conv2d (Output: batch_norm_5.tmp_292) + relu (Output: batch_norm_5.tmp_394) conv2d (Output: batch_norm_6.tmp_2107) + relu (Output: batch_norm_6.tmp_3109) conv2d (Output: batch_norm_7.tmp_2122) + elementwise (Output: elementwise_add_1124) + relu (Output: relu_1.tmp_0126) conv2d (Output: batch_norm_8.tmp_2139) + relu (Output: batch_norm_8.tmp_3141) conv2d (Output: batch_norm_9.tmp_2154) + relu (Output: batch_norm_9.tmp_3156) conv2d (Output: batch_norm_10.tmp_2169) + elementwise (Output: elementwise_add_2171) + relu (Output: relu_2.tmp_0173) conv2d (Output: batch_norm_11.tmp_2186) + relu (Output: batch_norm_11.tmp_3188) conv2d (Output: batch_norm_12.tmp_2201) + relu (Output: batch_norm_12.tmp_3203) conv2d (Output: batch_norm_13.tmp_2216) conv2d (Output: batch_norm_14.tmp_2229) + elementwise (Output: elementwise_add_3231) + relu (Output: relu_3.tmp_0233) conv2d (Output: batch_norm_15.tmp_2246) + relu (Output: batch_norm_15.tmp_3248) conv2d (Output: batch_norm_16.tmp_2261) + relu (Output: batch_norm_16.tmp_3263) conv2d (Output: batch_norm_17.tmp_2276) + elementwise (Output: elementwise_add_4278) + relu (Output: relu_4.tmp_0280) conv2d (Output: batch_norm_18.tmp_2293) + relu (Output: batch_norm_18.tmp_3295) conv2d (Output: batch_norm_19.tmp_2308) + relu (Output: batch_norm_19.tmp_3310) conv2d (Output: batch_norm_20.tmp_2323) + elementwise (Output: elementwise_add_5325) + relu (Output: relu_5.tmp_0327) conv2d (Output: batch_norm_21.tmp_2340) + relu (Output: batch_norm_21.tmp_3342) conv2d (Output: batch_norm_22.tmp_2355) + relu (Output: batch_norm_22.tmp_3357) conv2d (Output: batch_norm_23.tmp_2370) + elementwise (Output: elementwise_add_6372) + relu (Output: relu_6.tmp_0374) conv2d (Output: batch_norm_24.tmp_2387) + relu (Output: batch_norm_24.tmp_3389) conv2d (Output: batch_norm_25.tmp_2402) + relu (Output: batch_norm_25.tmp_3404) conv2d (Output: batch_norm_26.tmp_2417) conv2d (Output: batch_norm_27.tmp_2430) + elementwise (Output: elementwise_add_7432) + relu (Output: relu_7.tmp_0434) conv2d (Output: batch_norm_28.tmp_2447) + relu (Output: batch_norm_28.tmp_3449) conv2d (Output: batch_norm_29.tmp_2462) + relu (Output: batch_norm_29.tmp_3464) conv2d (Output: batch_norm_30.tmp_2477) + elementwise (Output: elementwise_add_8479) + relu (Output: relu_8.tmp_0481) conv2d (Output: batch_norm_31.tmp_2494) + relu (Output: batch_norm_31.tmp_3496) conv2d (Output: batch_norm_32.tmp_2509) + relu (Output: batch_norm_32.tmp_3511) conv2d (Output: batch_norm_33.tmp_2524) + elementwise (Output: elementwise_add_9526) + relu (Output: relu_9.tmp_0528) conv2d (Output: batch_norm_34.tmp_2541) + relu (Output: batch_norm_34.tmp_3543) conv2d (Output: batch_norm_35.tmp_2556) + relu (Output: batch_norm_35.tmp_3558) conv2d (Output: batch_norm_36.tmp_2571) + elementwise (Output: elementwise_add_10573) + relu (Output: relu_10.tmp_0575) conv2d (Output: batch_norm_37.tmp_2588) + relu (Output: batch_norm_37.tmp_3590) conv2d (Output: batch_norm_38.tmp_2603) + relu (Output: batch_norm_38.tmp_3605) conv2d (Output: batch_norm_39.tmp_2618) + elementwise (Output: elementwise_add_11620) + relu (Output: relu_11.tmp_0622) conv2d (Output: batch_norm_40.tmp_2635) + relu (Output: batch_norm_40.tmp_3637) conv2d (Output: batch_norm_41.tmp_2650) + relu (Output: batch_norm_41.tmp_3652) conv2d (Output: batch_norm_42.tmp_2665) + elementwise (Output: elementwise_add_12667) + relu (Output: relu_12.tmp_0669) conv2d (Output: batch_norm_43.tmp_2682) + relu (Output: batch_norm_43.tmp_3684) conv2d (Output: batch_norm_44.tmp_2697) + relu (Output: batch_norm_44.tmp_3699) conv2d (Output: batch_norm_45.tmp_2712) conv2d (Output: batch_norm_46.tmp_2725) + elementwise (Output: elementwise_add_13727) + relu (Output: relu_13.tmp_0729) conv2d (Output: batch_norm_47.tmp_2742) + relu (Output: batch_norm_47.tmp_3744) conv2d (Output: batch_norm_48.tmp_2757) + relu (Output: batch_norm_48.tmp_3759) conv2d (Output: batch_norm_49.tmp_2772) + elementwise (Output: elementwise_add_14774) + relu (Output: relu_14.tmp_0776) conv2d (Output: batch_norm_50.tmp_2789) + relu (Output: batch_norm_50.tmp_3791) conv2d (Output: batch_norm_51.tmp_2804) + relu (Output: batch_norm_51.tmp_3806) conv2d (Output: batch_norm_52.tmp_2819) + elementwise (Output: elementwise_add_15821) + relu (Output: relu_15.tmp_0823) pool2d (Output: pool2d_1.tmp_0825) fc_op_float: FullyConnected (Output: linear_1.tmp_1834) shuffle_after_fc (Output: linear_1.tmp_1834) softmax (Output: softmax_0.tmp_0836)

Bindings: inputs save_infer_model/scale_0.tmp_1838 I0519 06:51:40.521102 31697 engine.cc:422] ====== engine info end ====== --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [conv_elementwise_add_act_fuse_pass] --- Running IR pass [conv_elementwise_add2_act_fuse_pass] --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0519 06:51:40.539101 31697 ir_params_sync_among_devices_pass.cc:100] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [memory_optimize_pass] --- Running analysis [ir_graph_to_program_pass] I0519 06:51:40.581842 31697 analysis_predictor.cc:1007] ======= optimize end ======= I0519 06:51:40.585136 31697 naive_executor.cc:102] --- skip [feed], feed -> inputs I0519 06:51:40.585443 31697 naive_executor.cc:102] --- skip [save_infer_model/scale_0.tmp_1], fetch -> fetch W0519 06:51:40.586534 31697 gpu_context.cc:278] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.4, Runtime API Version: 11.2 W0519 06:51:40.586686 31697 gpu_context.cc:306] device: 0, cuDNN Version: 8.1. Traceback (most recent call last): File "paddle_trt.py", line 39, in result = run(pred, [img]) File "paddle_trt.py", line 26, in run predictor.run() ValueError: (InvalidArgument) Input shapes are inconsistent with the model. Expect [3, 224, 224] in model description, but got [3, 320, 320] in runtime. TRT 5 or lower version does not support dynamic input shapes. Please check and modify your input shapes. [Hint: Expected model_input_shape == runtime_input_shape == true, but received model_input_shape == runtime_input_shape:0 != true:1.] (at /paddle/paddle/fluid/operators/tensorrt/tensorrt_engine_op.h:79) 、、、

其他补充信息 Additional Supplementary Information

paddle-bot-old[bot] commented 2 years ago

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档常见问题历史IssueAI社区来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

jiguanglu commented 2 years ago

看到了文档。https://paddle-inference.readthedocs.io/en/latest/optimize/paddle_trt.html 解决了问题;