使用trt量化推理时，第一次运行能够成功，但再次运行无法成功，但将模型文件夹下生成的det_trt_dynamic_shape.txt删除后又可以了，请问这是怎么回事

chaoshixie commented 1 year ago

请提供下述完整信息以便快速定位问题/Please provide the following information to quickly locate the problem

系统环境/System Environment：Ubuntu18.04 python3.7
版本号/Version：Paddle：2.4.1 PaddleOCR：2.6 问题相关组件/Related components：
运行指令/Command Code：python tools/infer/predict_det.py ... --use_tensorrt=True --precision='int8'
完整报错/Complete Error Message： grep: warning: GREP_OPTIONS is deprecated; please use an alias or script grep: warning: GREP_OPTIONS is deprecated; please use an alias or script Fatal Python error: Segmentation fault

Current thread 0x00007f0372f1e3c0 (most recent call first): File "/PDF-OCR/PaddleOCR-release-2.4/drawing-ocr-master/PaddleOCR/tools/infer/utility.py", line 277 in create_predictor File "tools/infer/predict_det.py", line 146 in init File "tools/infer/predict_det.py", line 292 in

C++ Traceback (most recent call last):

0 paddle_infer::Predictor::Predictor(paddle::AnalysisConfig const&) 1 std::unique_ptr<paddle::PaddlePredictor, std::default_delete > paddle::CreatePaddlePredictor<paddle::AnalysisConfig, (paddle::PaddleEngineKind)2>(paddle::AnalysisConfig const&) 2 paddle::AnalysisPredictor::Init(std::shared_ptr const&, std::shared_ptr const&) 3 paddle::AnalysisPredictor::PrepareProgram(std::shared_ptr const&) 4 paddle::AnalysisPredictor::OptimizeInferenceProgram() 5 paddle::inference::analysis::Analyzer::RunAnalysis(paddle::inference::analysis::Argument) 6 paddle::inference::analysis::IrAnalysisPass::RunImpl(paddle::inference::analysis::Argument) 7 paddle::inference::analysis::IRPassManager::Apply(std::unique_ptr<paddle::framework::ir::Graph, std::default_delete >) 8 paddle::framework::ir::Pass::Apply(paddle::framework::ir::Graph) const 9 paddle::framework::ir::DeleteQuantDequantLinearOpPass::ApplyImpl(paddle::framework::ir::Graph) const 10 paddle::framework::ir::GraphPatternDetector::operator()(paddle::framework::ir::Graph, std::function<void (std::map<paddle::framework::ir::PDNode, paddle::framework::ir::Node, paddle::framework::ir::GraphPatternDetector::PDNodeCompare, std::allocator<std::pair<paddle::framework::ir::PDNode const, paddle::framework::ir::Node> > > const&, paddle::framework::ir::Graph)>)

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1676960786 (unix time) try "date -d @1676960786" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x2a179) received by PID 172409 (TID 0x7f0372f1e3c0) from PID 172409 ]

Segmentation fault (core dumped)

andyjiang1116 commented 1 year ago

用的哪个模型呢

chaoshixie commented 1 year ago

用的PaddleOCR/configs/det/ch_ppocr_v2.0/ch_det_res18_db_v2.0.yml进行QAT训练的

andyjiang1116 commented 1 year ago

量化模型推理部署可以参考这个文档 https://github.com/PaddlePaddle/PaddleOCR/tree/release/2.6/deploy/slim/quantization

chaoshixie commented 1 year ago

是参考这个进行量化训练的，唯一区别是在quant_config中加入了onnx_format=True（在PaddleSlim团队上看到的处理），训练时无异常且不开启tensorrt推理也无异常，就是使用了use_tensorrt=True后出现了上面的问题

andyjiang1116 commented 1 year ago

PaddleOCR-release-2.4

使用的是最新的2.6分支的代码吗？

chaoshixie commented 1 year ago

没，量化训练是ocr2.4版本，推理我2.4和2.6都试了

chaoshixie commented 1 year ago

好吧，刚重新训了1轮，发现是onnx_format=True的问题，不用这种格式保存就没事，但不这么保存，那体积小的优点不就没了（原来如果是200M，量化训练后还是200M）

andyjiang1116 commented 1 year ago

导出的量化模型，参数精度仍然是FP32，但是参数的数值范围是int8，所以体积基本不变

chaoshixie commented 1 year ago

是的，这我知道，只是设置onnx_format=True可以看到体积的变化（出处：https://github.com/PaddlePaddle/PaddleSlim/issues/1628），但无法跑通，是有bug么；还有我使用 --use_tensorrt=True --precision='int8'运行在某些数据上会报错，比如说：下面这段运行实例，第一张图像成功了，但第二张失败了，这可能是有什么原因造成的呢？

[2023/02/23 02:11:10] ppocr INFO: img_271.jpg [[[1425.0, 1185.0], [1470.0, 1185.0], [1470.0, 1222.0], [1425.0, 1222.0]], [[998.0, 1114.0], [968.0, 1145.0], [1031.0, 1147.0], [1000.0, 1178.0]], [[1571.0, 1025.0], [1625.0, 1040.0], [1610.0, 1096.0], [1555.0, 1081.0]], [[1345.0, 1030.0], [1389.0, 1030.0], [1389.0, 1082.0], [1345.0, 1082.0]], [[1212.0, 1027.0], [1270.0, 1027.0], [1270.0, 1094.0], [1212.0, 1094.0]], [[761.0, 1027.0], [817.0, 1027.0], [817.0, 1101.0], [761.0, 1101.0]], [[547.0, 999.0], [607.0, 1061.0], [559.0, 1110.0], [498.0, 1048.0]], [[976.0, 995.0], [1034.0, 1054.0], [989.0, 1101.0], [931.0, 1041.0]], [[829.0, 768.0], [871.0, 768.0], [871.0, 797.0], [829.0, 797.0]], [[134.0, 761.0], [183.0, 761.0], [183.0, 830.0], [134.0, 830.0]], [[1026.0, 709.0], [1061.0, 709.0], [1061.0, 756.0], [1026.0, 756.0]], [[761.0, 334.0], [793.0, 334.0], [793.0, 354.0], [761.0, 354.0]]]

[2023/02/23 02:11:10] ppocr INFO: 0 The predict time of ./img/img_271.jpg: 0.049730777740478516 [2023/02/23 02:11:10] ppocr INFO: The visualized image saved in ./inference_results/det_res_img_271.jpg I0223 02:11:10.522697 217198 engine.h:588] refactor shape range: x, max_shape from (1,3,704,960) to (1,3,736,960) I0223 02:11:10.522749 217198 tensorrt_engine_op.h:376] Adjust dynamic shape range, rebuild trt engine! I0223 02:11:10.524215 217198 tensorrt_engine_op.h:301] Prepare TRT engine (Optimize model structure, Select OP kernel etc). This process may cost a lot of time. Fatal Python error: Segmentation fault

Current thread 0x00007f2836b763c0 (most recent call first): File "tools/infer/predict_det.py", line 247 in call File "tools/infer/predict_det.py", line 319 in

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: Aborted at 1677118270 (unix time) try "date -d @1677118270" if you are using GNU date ] [SignalInfo: SIGSEGV (@0x3506e) received by PID 217198 (TID 0x7f2836b763c0) from PID 217198 ]

Segmentation fault (core dumped)

andyjiang1116 commented 1 year ago

可以看下是不是测试图片的问题

chaoshixie commented 1 year ago

不是，我不用tensorrt就没问题

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

PaddlePaddle / PaddleOCR