PaddlePaddle / PaddleSeg

Easy-to-use image segmentation library with awesome pre-trained model zoo, supporting wide-range of practical tasks in Semantic Segmentation, Interactive Segmentation, Panoptic Segmentation, Image Matting, 3D Segmentation, etc.
https://arxiv.org/abs/2101.06175
Apache License 2.0
8.66k stars 1.68k forks source link

[Bug]PaddleSeg 使用tensorRT加速,仅仅分割了几张图片,就退出 #2160

Closed jiguanglu closed 2 years ago

jiguanglu commented 2 years ago
  1. PaddleSeg版本:(PaddleSeg release/2.4)

  2. PaddlePaddle版本:(PaddlePaddle 2.2.2)

  3. 操作系统信息:(Ubuntu18.04)

  4. Python版本号:(如Python3.8)

  5. CUDA/cuDNN版本:( CUDA11.1/cuDNN8.0等)

  6. 完整的代码:(若修改过原代码,请提供修改前后代码对比)

    • 完全按照 https://github.com/PaddlePaddle/PaddleSeg/blob/release/2.5/docs/whole_process_cn.md 里面的教程,下载,训练,预测,模型导出,参数没有修改;
    • 然后使用教程里的代码做推理,只是--image_path 使用了整个文件里的文件    python deploy/python/infer.py        --config output/deploy.yaml         --image_path /tmp/PaddleSeg/data/optic_disc_seg/JPEGImages/        --use_trt True --save_dir output/out_trt        --enable_auto_tune True
  7. 详细的错误信息、相关log:(若使用多卡,log默认保存在log/worklog.0) python deploy/python/infer.py --config output/deploy.yaml --image_path /tmp/PaddleSeg/data/optic_disc_seg/JPEGImages/ --use_trt True --save_dir output/out_trt --enable_auto_tune True /usr/local/lib/python3.8/site-packages/setuptools/depends.py:2: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses import imp /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:36: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead. 'nearest': Image.NEAREST, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:37: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead. 'bilinear': Image.BILINEAR, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:38: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead. 'bicubic': Image.BICUBIC, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:39: DeprecationWarning: BOX is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BOX instead. 'box': Image.BOX, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:40: DeprecationWarning: LANCZOS is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.LANCZOS instead. 'lanczos': Image.LANCZOS, /usr/local/lib/python3.8/site-packages/paddle/vision/transforms/functional_pil.py:41: DeprecationWarning: HAMMING is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.HAMMING instead. 'hamming': Image.HAMMING /usr/local/lib/python3.8/site-packages/paddleseg/models/losses/decoupledsegnet_relax_boundary_loss.py:19: DeprecationWarning: Please use shift from the scipy.ndimage namespace, the scipy.ndimage.interpolation namespace is deprecated. from scipy.ndimage.interpolation import shift /usr/local/lib/python3.8/site-packages/paddleseg/transforms/functional.py:18: DeprecationWarning: Please use distance_transform_edt from the scipy.ndimage namespace, the scipy.ndimage.morphology namespace is deprecated. from scipy.ndimage.morphology import distance_transform_edt 2022-05-26 08:48:17 [INFO] Auto tune the dynamic shape for GPU TRT. I0526 08:48:17.053037 16275 analysis_config.cc:917] In CollectShapeInfo mode, we will disable optimizations and collect the shape information of all intermediate tensors in the compute graph and calculate the min_shape, max_shape and opt_shape. W0526 08:48:17.453675 16275 analysis_predictor.cc:795] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect. I0526 08:48:17.465394 16275 analysis_predictor.cc:665] ir_optim is turned off, no IR pass will be executed --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0526 08:48:17.494113 16275 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [ir_graph_to_program_pass] I0526 08:48:17.548259 16275 analysis_predictor.cc:714] ======= optimize end ======= I0526 08:48:17.550683 16275 naive_executor.cc:98] --- skip [feed], feed -> x I0526 08:48:17.552196 16275 naive_executor.cc:98] --- skip [argmax_0.tmp_0], fetch -> fetch W0526 08:48:17.564620 16275 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 8.6, Driver API Version: 11.6, Runtime API Version: 11.1 W0526 08:48:17.565778 16275 device_context.cc:465] device: 0, cuDNN Version: 8.0. 2022-05-26 08:48:19 [INFO] Auto tune success.

2022-05-26 08:48:19 [INFO] Use GPU 2022-05-26 08:48:19 [INFO] Use TRT 2022-05-26 08:48:19 [INFO] Use auto tuned dynamic shape I0526 08:48:19.248214 16275 analysis_predictor.cc:576] TensorRT subgraph engine is enabled --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [conv_affine_channel_fuse_pass] --- Running IR pass [adaptive_pool2d_convert_global_pass] I0526 08:48:19.265250 16275 fuse_pass_base.cc:57] --- detected 1 subgraphs --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] --- Running IR pass [shuffle_channel_detect_pass] --- Running IR pass [quant_conv2d_dequant_fuse_pass] --- Running IR pass [delete_quant_dequant_op_pass] --- Running IR pass [delete_quant_dequant_filter_op_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [skip_layernorm_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0526 08:48:19.287051 16275 fuse_pass_base.cc:57] --- detected 39 subgraphs --- Running IR pass [unsqueeze2_eltwise_fuse_pass] --- Running IR pass [squeeze2_matmul_fuse_pass] --- Running IR pass [reshape2_matmul_fuse_pass] --- Running IR pass [flatten2_matmul_fuse_pass] --- Running IR pass [map_matmul_v2_to_mul_pass] --- Running IR pass [map_matmul_v2_to_matmul_pass] --- Running IR pass [map_matmul_to_mul_pass] --- Running IR pass [fc_fuse_pass] --- Running IR pass [conv_elementwise_add_fuse_pass] --- Running IR pass [tensorrt_subgraph_pass] I0526 08:48:19.302623 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 88 nodes I0526 08:48:19.308516 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:19.552888 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:30.183140 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 5 nodes I0526 08:48:30.187425 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:30.188251 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:35.730669 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 08:48:35.732797 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:35.734086 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:41.873677 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 12 nodes I0526 08:48:41.875895 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:41.877171 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. I0526 08:48:42.746399 16275 tensorrt_subgraph_pass.cc:138] --- detect a sub-graph with 15 nodes I0526 08:48:42.748675 16275 tensorrt_subgraph_pass.cc:395] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:42.750090 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0526 08:48:43.951342 16275 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [memory_optimize_pass] I0526 08:48:43.952843 16275 memory_optimize_pass.cc:216] Cluster name : shape_1.tmp_0_slice_0 size: 8 I0526 08:48:43.952847 16275 memory_optimize_pass.cc:216] Cluster name : shape_1.tmp_0 size: 16 I0526 08:48:43.952848 16275 memory_optimize_pass.cc:216] Cluster name : shape_0.tmp_0_slice_0 size: 8 --- Running analysis [ir_graph_to_program_pass] I0526 08:48:43.992440 16275 analysis_predictor.cc:714] ======= optimize end ======= I0526 08:48:43.995501 16275 naive_executor.cc:98] --- skip [feed], feed -> x I0526 08:48:43.995982 16275 naive_executor.cc:98] --- skip [argmax_0.tmp_0], fetch -> fetch I0526 08:48:44.265367 16275 engine.h:438] refactor shape range: x, max_shape from (1,3,496,512) to (1,3,512,512) I0526 08:48:44.265389 16275 tensorrt_engine_op.h:306] Adjust dynamic shape range, rebuild trt engine! I0526 08:48:44.266453 16275 tensorrt_engine_op.h:589] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. I0526 08:48:44.276803 16275 engine.cc:197] Run Paddle-TRT Dynamic Shape mode. E0526 08:48:46.027516 16275 helper.h:111] Assertion failed: validateInputsCutensor(src, dst) ../rtSafe/cuda/cutensorReformat.cpp:227 Aborting... E0526 08:48:46.034963 16275 helper.h:111] ../rtSafe/cuda/cutensorReformat.cpp (227) - Assertion Error in executeCutensor: 0 (validateInputsCutensor(src, dst)) Traceback (most recent call last): File "deploy/python/infer.py", line 428, in main(args) File "deploy/python/infer.py", line 416, in main predictor.run(imgs_list) File "deploy/python/infer.py", line 375, in run self.predictor.run() SystemError:


C++ Traceback (most recent call last):

0 paddle::AnalysisPredictor::ZeroCopyRun() 1 paddle::framework::NaiveExecutor::Run() 2 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&) 3 paddle::operators::TensorRTEngineOp::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const 4 paddle::operators::TensorRTEngineOp::PrepareTRTEngine(paddle::framework::Scope const&, paddle::inference::tensorrt::TensorRTEngine) const 5 paddle::inference::tensorrt::OpConverter::ConvertBlockToTRTEngine(paddle::framework::BlockDesc, paddle::framework::Scope const&, std::vector<std::string, std::allocator > const&, std::unordered_set<std::string, std::hash, std::equal_to, std::allocator > const&, std::vector<std::string, std::allocator > const&, paddle::inference::tensorrt::TensorRTEngine) 6 paddle::inference::tensorrt::TensorRTEngine::FreezeNetwork() 7 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const, int) 8 paddle::platform::GetCurrentTraceBackStringabi:cxx11


Error Message Summary:

FatalError: Build TensorRT cuda engine failed! Please recheck you configurations related to paddle-TensorRT. [Hint: inferengine should not be null.] (at /paddle/paddle/fluid/inference/tensorrt/engine.cc:252)

  1. 其他内容:
juncaipeng commented 2 years ago

测试图片中是很多不同尺寸的图片吧,你可以修改一下deploy/infer.py中tune_img_nums大于你的图片数量,这样可能auto tune的过程会比较久。 image 或者保证你的所有测试图片尺寸相同。

jiguanglu commented 2 years ago

测试图片中是很多不同尺寸的图片吧,你可以修改一下deploy/infer.py中tune_img_nums大于你的图片数量,这样可能auto tune的过程会比较久。 image 或者保证你的所有测试图片尺寸相同。

我修改了尺寸,但是发现 加上tensorRT的推理速度还不如直接使用GPU,这里后期要有跟新吗?还是我的有代码有问题?

juncaipeng commented 2 years ago

你使用最新的2.5分支代码。