[Bug]PaddleSeg 使用tensorRT加速，仅仅分割了几张图片，就退出

PaddleSeg版本：（PaddleSeg release/2.４）
PaddlePaddle版本：（PaddlePaddle 2.2.2）
操作系统信息：（Ubuntu18.04）
Python版本号：（如Python3.8）
CUDA/cuDNN版本：（ CUDA11.1/cuDNN8.0等）
完整的代码：(若修改过原代码，请提供修改前后代码对比）
- 完全按照　https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/whole_process_cn.md　里面的教程，下载，训练，预测，模型导出，参数没有修改；
- 然后使用教程里的代码做推理，只是--image_path　使用了整个文件里的文件　　　python deploy/python/infer.py 　　　　　　　--config output/deploy.yaml 　　　　　　　 --image_path /tmp/PaddleSeg/data/optic_disc_seg/JPEGImages/ 　　　　　　　--use_trt True --save_dir output/out_trt 　　　　　　　--enable_auto_tune True
详细的错误信息、相关log：（若使用多卡，log默认保存在log/worklog.0） python deploy/python/infer.py --config output/deploy.yaml --image_path /tmp/PaddleSeg/data/optic_disc_seg/JPEGImages/ --use_trt True --save_dir output/out_trt --enable_auto_tune True /usr/local/lib/python3.8/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:36: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead. 'nearest': Image.NEAREST, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:37: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. 'bilinear': Image.BILINEAR, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. 'bicubic': Image.BICUBIC, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:39: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead. 'box': Image.BOX, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:40: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead. 'lanczos': Image.LANCZOS, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:41: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead. 'hamming': Image.HAMMING /usr/local/lib/python3.8/site-packages/paddleseg/models/losses/decoupledsegnet_relax_boundary_loss.py:19: DeprecationWarning: Please use shift from the scipy.ndimage namespace, the scipy.ndimage.interpolation namespace is deprecated. from scipy.ndimage.interpolation import shift /usr/local/lib/python3.8/site-packages/paddleseg/transforms/functional.py:18: DeprecationWarning: Please use distance_transform_edt from the scipy.ndimage namespace, the scipy.ndimage.morphology namespace is deprecated. from scipy.ndimage.morphology import distance_transform_edt 2022-05-26 08:48:17 [INFO] Auto tune the dynamic shape for GPU TRT. I0526 08:48:17.053037 16275 analysis_config.cc:917] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape. W0526 08:48:17.453675 16275 analysis_predictor.cc:795] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect. I0526 08:48:17.465394 16275 analysis_predictor.cc:665] ir_optim is turned off, no IR pass will be executed --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0526 08:48:17.494113 16275 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [ir_graph_to_program_pass] I0526 08:48:17.548259 16275 analysis_predictor.cc:714] ======= optimize end ======= I0526 08:48:17.550683 16275 naive_executor.cc:98] --- skip [feed], feed -> x I0526 08:48:17.552196 16275 naive_executor.cc:98] --- skip [argmax_0.tmp_0], fetch -> fetch W0526 08:48:17.564620 16275 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.1 W0526 08:48:17.565778 16275 device_context.cc:465] device: 0, cuDNN Version: 8.0. 2022-05-26 08:48:19 [INFO] Auto tune success.

2022-05-26 08:48:19 [INFO] Use GPU 2022-05-26 08:48:19 [INFO] Use TRT 2022-05-26 08:48:19 [INFO] Use auto tuned dynamic shape I0526 08:48:19.248214 16275 analysis_predictor.cc:576] TensorRT subgraph engine is enabled --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [conv_affine_channel_fuse_pass] --- Running IR pass [adaptive_pool2d_convert_global_pass] I0526 08:48:19.265250 16275 fuse_pass_base.cc:57] --- detected 1 subgraphs --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] --- Running IR pass [shuffle_channel_detect_pass] --- Running IR pass [quant_conv2d_dequant_fuse_pass] --- Running IR pass [delete_quant_dequant_op_pass] --- Running IR pass [delete_quant_dequant_filter_op_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [skip_layernorm_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0526 08:48:19.287051 16275 fuse_pass_base.cc:57] --- detected 39 subgraphs --- Running IR pass [unsqueeze2_eltwise_fuse_pass] --- Running IR pass [squeeze2_matmul_fuse_pass] --- Running IR pass [reshape2_matmul_fuse_pass] --- Running IR pass [flatten2_matmul_fuse_pass] --- Running IR pass [map_matmul_v2_to_mul_pass] --- Running IR pass [map_matmul_v2_to_matmul_pass] --- Running IR pass [map_matmul_to_mul_pass] --- Running IR pass [fc_fuse_pass] --- Running IR pass [conv_elementwise_add_fuse_pass] --- Running IR pass [tensorrt_subgraph_pass] I0526 08:48:19.302623 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 88 nodes I0526 08:48:19.308516 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:19.552888 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:30.183140 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 5 nodes I0526 08:48:30.187425 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:30.188251 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:35.730669 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 08:48:35.732797 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:35.734086 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:41.873677 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 08:48:41.875895 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:41.877171 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:42.746399 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 15 nodes I0526 08:48:42.748675 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:42.750090 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0526 08:48:43.951342 16275 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [memory_optimize_pass] I0526 08:48:43.952843 16275 memory_optimize_pass.cc:216] Cluster name : shape_1.tmp_0_slice_0 size: 8 I0526 08:48:43.952847 16275 memory_optimize_pass.cc:216] Cluster name : shape_1.tmp_0 size: 16 I0526 08:48:43.952848 16275 memory_optimize_pass.cc:216] Cluster name : shape_0.tmp_0_slice_0 size: 8 --- Running analysis [ir_graph_to_program_pass] I0526 08:48:43.992440 16275 analysis_predictor.cc:714] ======= optimize end ======= I0526 08:48:43.995501 16275 naive_executor.cc:98] --- skip [feed], feed -> x I0526 08:48:43.995982 16275 naive_executor.cc:98] --- skip [argmax_0.tmp_0], fetch -> fetch I0526 08:48:44.265367 16275 engine.h:438] refactor shape range: x, max_shape from (1,3,496,512) to (1,3,512,512) I0526 08:48:44.265389 16275 tensorrt_engine_op.h:306] Adjust dynamic shape range, rebuild trt engine! I0526 08:48:44.266453 16275 tensorrt_engine_op.h:589] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:44.276803 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. E0526 08:48:46.027516 16275 helper.h:111] Assertion failed: validateInputsCutensor(src, dst) ../rtSafe/cuda/cutensorReformat.cpp:227 Aborting... E0526 08:48:46.034963 16275 helper.h:111] ../rtSafe/cuda/cutensorReformat.cpp (227) - Assertion Error in executeCutensor: 0 (validateInputsCutensor(src, dst)) Traceback (most recent call last): File "deploy/python/infer.py", line 428, in main(args) File "deploy/python/infer.py", line 416, in main predictor.run(imgs_list) File "deploy/python/infer.py", line 375, in run self.predictor.run() SystemError:

C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun() 1 paddle::framework::NaiveExecutor::Run() 2 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 3 paddle::operators::TensorRTEngineOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 4 paddle::operators::TensorRTEngineOp::PrepareTRTEngine(paddle::framework::Scope const&, paddle::inference::tensorrt::TensorRTEngine) const 5 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc, paddle::framework::Scope const&, std::vector<std::string, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<std::string, std::allocator > const&, paddle::inference::tensorrt::TensorRTEngine) 6 paddle::inference::tensorrt::TensorRTEngine::FreezeNetwork() 7 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const, int) 8 paddle::platform::GetCurrentTraceBackStringabi:cxx11

Error Message Summary:

FatalError: Build TensorRT cuda engine failed! Please recheck you configurations related to paddle-TensorRT. [Hint: inferengine should not be null.] (at /paddle/paddle/fluid/inference/tensorrt/engine.cc:252)

其他内容:

这里使用的paddlepaddle-gpu 与tensorRT7.2.2.3是我自己编译的，且运行没有问题，在PPCls　跑过测试，没有问题；
但是在这里使用-tensorRT加速，仅仅分割了几张图片，就出现上面的错误；
如果测试一张图片，就成功，但是我很奇怪为什么会出现下面的情况I0526 09:10:02.054121 16396
tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 88 nodes I0526 09:10:02.060820 16396 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 09:10:02.304020 16396 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 09:10:12.828550 16396 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 09:10:12.833051 16396 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 09:10:12.834429 16396 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 09:10:18.896679 16396 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 09:10:18.898720 16396 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 09:10:18.900023 16396 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 09:10:19.836130 16396 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 15 nodes I0526 09:10:19.838363 16396 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 09:10:19.839790 16396 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 09:10:21.024145 16396 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 5 nodes I0526 09:10:21.026183 16396 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 09:10:21.027000 16396 engine.cc:197] Run Paddle-TRT Dynamic Shape mode.

PaddlePaddle / PaddleSeg

[Bug]PaddleSeg 使用tensorRT加速，仅仅分割了几张图片，就退出 #2160

C++ Traceback (most recent call last):

Error Message Summary: