PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
43.02k stars 7.72k forks source link

加载tensorrt序列化文件时出错 #4076

Closed joe12306 closed 1 year ago

joe12306 commented 3 years ago

您好,我现在使用的环境如下: Paddle-2.1.2 Jetson xavier NX Jetpack-4.4 CUDA 10.2.89 TensorRT 7.1.3.0 cuDNN 8.0.0.180 PaddleOCR-2.3

修改了tools/infer/utility.py中的create_predictor if args.use_tensorrt: config.enable_tensorrt_engine( precision_mode=precision, max_batch_size=args.max_batch_size, min_subgraph_size=args.min_subgraph_size, use_static=True) 这里添加了use_static=True,使得tensorrt序列化

执行以下命令: python3 tools/infer/predict_system.py \ --image_dir="./test2/" \ --det_model_dir="./inference/ch_PP-OCRv2_det_infer/" \ --rec_model_dir="./inference/ch_PP-OCRv2_rec_infer/" \ --cls_model_dir="./inference/ch_ppocr_mobile_v2.0_cls_infer/" \ --use_angle_cls=False \ --use_space_char=True \ --use_gpu=True \ --use_tensorrt=True \ --precision=fp16 \ --det_limit_side_len=640 \ --det_db_thresh=0.3 \ --det_db_box_thresh=0.6

第一次运行时,能正常推理以及生成序列化文件。 但是再次运行时,使用序列化文件报错,该如何解决呢?

WARNING: AVX is not support on your machine. Hence, no_avx core will be imported, It has much worse preformance than avx core. W0916 11:24:26.642565 14397 analysis_predictor.cc:677] The one-time configuration of analysis predictor failed, which may be due to native predictor called first and its configurations taken effect. I0916 11:24:26.642894 14397 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated. I0916 11:24:26.694703 14397 analysis_predictor.cc:522] TensorRT subgraph engine is enabled --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [conv_affine_channel_fuse_pass] --- Running IR pass [adaptive_pool2d_convert_global_pass] --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] --- Running IR pass [shuffle_channel_detect_pass] --- Running IR pass [quant_conv2d_dequant_fuse_pass] --- Running IR pass [delete_quant_dequant_op_pass] --- Running IR pass [delete_quant_dequant_filter_op_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [skip_layernorm_fuse_pass] --- Running IR pass [unsqueeze2_eltwise_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0916 11:24:27.004281 14397 graph_pattern_detector.cc:101] --- detected 33 subgraphs --- Running IR pass [squeeze2_matmul_fuse_pass] --- Running IR pass [reshape2_matmul_fuse_pass] --- Running IR pass [flatten2_matmul_fuse_pass] --- Running IR pass [map_matmul_to_mul_pass] --- Running IR pass [fc_fuse_pass] --- Running IR pass [conv_elementwise_add_fuse_pass] I0916 11:24:27.050493 14397 graph_pattern_detector.cc:101] --- detected 33 subgraphs --- Running IR pass [tensorrt_subgraph_pass] I0916 11:24:27.116706 14397 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 108 nodes W0916 11:24:27.167860 14397 tensorrt_subgraph_pass.cc:304] The Paddle lib links the 7130 version TensorRT, make sure the runtime TensorRT you are using is no less than this version, otherwise, there might be Segfault! I0916 11:24:30.852931 14397 tensorrt_subgraph_pass.cc:337] Load TRT Optimized Info from ./inference/ch_PP-OCRv2_det_infer//_opt_cache//trt_serialized_8137448676646311277 --- Running IR pass [conv_bn_fuse_pass] --- Running IR pass [transpose_flatten_concat_fuse_pass] --- Running analysis [ir_params_sync_among_devices_pass] I0916 11:24:30.897047 14397 ir_params_sync_among_devices_pass.cc:45] Sync params from CPU to GPU --- Running analysis [adjust_cudnn_workspace_size_pass] --- Running analysis [inference_op_replace_pass] --- Running analysis [memory_optimize_pass] I0916 11:24:30.918109 14397 memory_optimize_pass.cc:201] Cluster name : conv2d_185.tmp_0 size: 1382400 I0916 11:24:30.918171 14397 memory_optimize_pass.cc:201] Cluster name : conv2d_186.tmp_0 size: 5529600 I0916 11:24:30.918196 14397 memory_optimize_pass.cc:201] Cluster name : nearest_interp_v2_5.tmp_0 size: 5529600 I0916 11:24:30.918217 14397 memory_optimize_pass.cc:201] Cluster name : nearest_interp_v2_4.tmp_0 size: 5529600 I0916 11:24:30.918269 14397 memory_optimize_pass.cc:201] Cluster name : tmp_2 size: 22118400 I0916 11:24:30.918289 14397 memory_optimize_pass.cc:201] Cluster name : nearest_interp_v2_2.tmp_0 size: 22118400 --- Running analysis [ir_graph_to_program_pass] I0916 11:24:31.051973 14397 analysis_predictor.cc:598] ======= optimize end ======= I0916 11:24:31.052702 14397 naive_executor.cc:107] --- skip [feed], feed -> x I0916 11:24:31.054613 14397 naive_executor.cc:107] --- skip [nearest_interp_v2_2.tmp_0], fetch -> fetch I0916 11:24:31.075569 14397 analysis_predictor.cc:155] Profiler is deactivated, and no profiling report will be generated. I0916 11:24:31.093502 14397 analysis_predictor.cc:522] TensorRT subgraph engine is enabled --- Running analysis [ir_graph_build_pass] --- Running analysis [ir_graph_clean_pass] --- Running analysis [ir_analysis_pass] --- Running IR pass [conv_affine_channel_fuse_pass] --- Running IR pass [adaptive_pool2d_convert_global_pass] --- Running IR pass [conv_eltwiseadd_affine_channel_fuse_pass] --- Running IR pass [shuffle_channel_detect_pass] --- Running IR pass [quant_conv2d_dequant_fuse_pass] --- Running IR pass [delete_quant_dequant_op_pass] --- Running IR pass [delete_quant_dequant_filter_op_pass] --- Running IR pass [simplify_with_basic_ops_pass] --- Running IR pass [embedding_eltwise_layernorm_fuse_pass] --- Running IR pass [multihead_matmul_fuse_pass_v2] --- Running IR pass [multihead_matmul_fuse_pass_v3] --- Running IR pass [skip_layernorm_fuse_pass] --- Running IR pass [unsqueeze2_eltwise_fuse_pass] --- Running IR pass [conv_bn_fuse_pass] I0916 11:24:31.227874 14397 graph_pattern_detector.cc:101] --- detected 14 subgraphs --- Running IR pass [squeeze2_matmul_fuse_pass] --- Running IR pass [reshape2_matmul_fuse_pass] --- Running IR pass [flatten2_matmul_fuse_pass] --- Running IR pass [map_matmul_to_mul_pass] I0916 11:24:31.242269 14397 graph_pattern_detector.cc:101] --- detected 2 subgraphs --- Running IR pass [fc_fuse_pass] I0916 11:24:31.245939 14397 graph_pattern_detector.cc:101] --- detected 2 subgraphs --- Running IR pass [conv_elementwise_add_fuse_pass] I0916 11:24:31.251266 14397 graph_pattern_detector.cc:101] --- detected 18 subgraphs --- Running IR pass [tensorrt_subgraph_pass] I0916 11:24:31.272850 14397 tensorrt_subgraph_pass.cc:126] --- detect a sub-graph with 80 nodes E0916 11:24:31.372685 14397 helper.h:78] INVALID_ARGUMENT: getPluginCreator could not find plugin elementwise_plugin version 1 E0916 11:24:31.372783 14397 helper.h:78] safeDeserializationUtils.cpp (323) - Serialization Error in load: 0 (Cannot deserialize plugin since corresponding IPluginCreator not found in Plugin Registry) E0916 11:24:31.373318 14397 helper.h:78] INVALID_STATE: std::exception E0916 11:24:31.387934 14397 helper.h:78] INVALID_CONFIG: Deserialize the cuda engine failed. Traceback (most recent call last): File "tools/infer/predict_system.py", line 193, in main(args) File "tools/infer/predict_system.py", line 117, in main text_sys = TextSystem(args) File "tools/infer/predict_system.py", line 46, in init self.text_recognizer = predict_rec.TextRecognizer(args) File "/home/nvidia/PPv2/PaddleOCR/tools/infer/predict_rec.py", line 66, in init utility.create_predictor(args, 'rec', logger) File "/home/nvidia/PPv2/PaddleOCR/tools/infer/utility.py", line 279, in create_predictor predictor = inference.create_predictor(config) SystemError:


C++ Traceback (most recent call last):

0 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisPredictor::Init(std::shared_ptr const&, std::shared_ptr const&) 3 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr const&) 4 paddle::AnalysisPredictor::OptimizeInferenceProgram() 5 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument) 6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument) 7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete >) 8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph) const 9 paddle::inference::analysis::TensorRtSubgraphPass::ApplyImpl(paddle::framework::ir::Graph) const 10 paddle::inference::analysis::TensorRtSubgraphPass::CreateTensorRTOp(paddle::framework::ir::Node, paddle::framework::ir::Graph, std::vector<std::string, std::allocator > const&, std::vector<std::string, std::allocator >) const 11 paddle::inference::tensorrt::TensorRTEngine::Deserialize(std::string const&) 12 paddle::platform::EnforceNotMet::EnforceNotMet(paddle::platform::ErrorSummary const&, char const, int) 13 paddle::platform::GetCurrentTraceBackString[abi:cxx11]()


Error Message Summary:

FatalError: Building TRT cuda engine failed when deserializing engine info. Please check:

  1. Your TRT serialization is generated and loaded on the same GPU architecture;
  2. The Paddle Inference version of generating serialization file and doing inference are consistent. [Hint: inferengine should not be null.] (at /home/nvidia/Paddle/paddle/fluid/inference/tensorrt/engine.h:295)
github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.