PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.31k stars 5.63k forks source link

使用cpp_infer推理训练的layout模型报错:InvalidArgumentError: The split Op's Input Variable `X` contains uninitialized Tensor. #66743

Open JianyuZhan opened 4 months ago

JianyuZhan commented 4 months ago

bug描述 Describe the Bug

按照https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppstructure/layout/README_ch.md#72-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86 指引,成功训练了版面分析模型,成功导出,并用PaddleDetection中的deploy/python/infer.py成功推理。

在线上,想使用之前成功推理过OCR的C++版本的cpp_infer来推理我刚训练导出的模型。 命令如下:

./build/ppocr --image_dir=../../ppstructure/docs/table/1.png --type=structure --det=false --rec=false --layout_model_dir=/opt/ml/model/layout/test/ --layout=true --table=false

其中, /opt/ml/model/layout/test/就是我训练导出的layout模型:

ls -l /opt/ml/model/layout/test/ 
total 7.4M
-rw-rw-r-- 1 1000 1000  543 Jul 29 06:48 infer_cfg.yml
-rw-rw-r-- 1 1000 1000 7.1M Jul 29 06:49 model.pdiparams
-rw-rw-r-- 1 1000 1000  44K Jul 29 06:49 model.pdiparams.info
-rw-rw-r-- 1 1000 1000 237K Jul 29 06:49 model.pdmodel

这个命令报错:

total images num: 1
predict img: ../../ppstructure/docs/table/1.png
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
  what():  

  Compile Traceback (most recent call last):
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 118, in <module>
      main()
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 114, in main
      run(FLAGS, cfg)
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 80, in run
      trainer.export(FLAGS.output_dir, for_fd=FLAGS.for_fd)
    File "/home/ubuntu/PaddleDetection/ppdet/engine/trainer.py", line 1229, in export
      static_model, pruned_input_spec = self._get_infer_cfg_and_input_spec(
    File "/home/ubuntu/PaddleDetection/ppdet/engine/trainer.py", line 1181, in _get_infer_cfg_and_input_spec
      input_spec, static_model.forward.main_program,
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1062, in main_program
      concrete_program = self.concrete_program
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 941, in concrete_program
      return self.concrete_program_specify_input_spec(input_spec=None)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 986, in concrete_program_specify_input_spec
      concrete_program, _ = self.get_concrete_program(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 875, in get_concrete_program
      concrete_program, partial_program_layer = self._program_cache[
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1648, in __getitem__
      self._caches[item_id] = self._build_once(item)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1575, in _build_once
      concrete_program = ConcreteProgram.from_func_spec(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/wrapped_decorator.py", line 26, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/dygraph/base.py", line 68, in __impl__
      return func(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1339, in from_func_spec
      outputs = static_func(*inputs)
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 59, in forward
      if self.training:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 69, in forward
      for inp in inputs_list:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 162, in convert_while_loop
      _run_py_while(cond, body, getter, setter)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 231, in _run_py_while
      body()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 76, in forward
      outs.append(self.get_pred())
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 90, in get_pred
      if not self.export_post_process:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 92, in get_pred
      elif self.export_nms:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 93, in get_pred
      bbox_pred, bbox_num = self._forward()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 68, in _forward
      if self.training or not self.export_post_process:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 72, in _forward
      bboxes, bbox_num = self.head.post_process(
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/heads/pico_head.py", line 407, in post_process
      if not export_nms:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/heads/pico_head.py", line 411, in post_process
      scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 2273, in split
      helper.append_op(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/layer_helper.py", line 44, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/framework.py", line 4467, in append_op
      op = Operator(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/framework.py", line 3016, in __init__
      for frame in traceback.extract_stack():

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::AnalysisPredictor::ZeroCopyRun()
1   paddle::framework::NaiveExecutor::Run()
2   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, phi::Place const&)
3   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&) const
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&, paddle::framework::RuntimeContext*) const
5   paddle::framework::OperatorWithKernel::InnerGetExpectedKernelType(paddle::framework::ExecutionContext const&) const
6   paddle::operators::SplitOp::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const
7   paddle::framework::OperatorWithKernel::IndicateVarDataType(paddle::framework::ExecutionContext const&, std::string const&) const
8   paddle::framework::OperatorWithKernel::ParseInputDataType(paddle::framework::Variable const*, std::string const&, paddle::framework::proto::VarType_Type*) const
9   phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
10  phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The split Op's Input Variable `X` contains uninitialized Tensor.
  [Hint: Expected t->IsInitialized() == true, but received t->IsInitialized():0 != true:1.] (at /paddle/paddle/fluid/framework/operator.cc:2094)
  [operator < split > error]
Aborted (core dumped)
λ cdcca6c96c98 /app/PaddleOCR/deploy/cpp_infer ./build/ppocr --image_dir=../../ppstructure/docs/table/1.png  --type=structure --det=false --rec=false --layout_model_dir=/opt/ml/model/layout/picodet_lcnet_x1_0_layout/  --layout=true --table=false
total images num: 1
predict img: ../../ppstructure/docs/table/1.png
terminate called after throwing an instance of 'phi::enforce::EnforceNotMet'
  what():  

  Compile Traceback (most recent call last):
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 118, in <module>
      main()
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 114, in main
      run(FLAGS, cfg)
    File "/home/ubuntu/PaddleDetection/tools/export_model.py", line 80, in run
      trainer.export(FLAGS.output_dir, for_fd=FLAGS.for_fd)
    File "/home/ubuntu/PaddleDetection/ppdet/engine/trainer.py", line 1229, in export
      static_model, pruned_input_spec = self._get_infer_cfg_and_input_spec(
    File "/home/ubuntu/PaddleDetection/ppdet/engine/trainer.py", line 1181, in _get_infer_cfg_and_input_spec
      input_spec, static_model.forward.main_program,
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1062, in main_program
      concrete_program = self.concrete_program
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 941, in concrete_program
      return self.concrete_program_specify_input_spec(input_spec=None)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 986, in concrete_program_specify_input_spec
      concrete_program, _ = self.get_concrete_program(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 875, in get_concrete_program
      concrete_program, partial_program_layer = self._program_cache[
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1648, in __getitem__
      self._caches[item_id] = self._build_once(item)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1575, in _build_once
      concrete_program = ConcreteProgram.from_func_spec(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/decorator.py", line 232, in fun
      return caller(func, *(extras + args), **kw)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/wrapped_decorator.py", line 26, in __impl__
      return wrapped_func(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/dygraph/base.py", line 68, in __impl__
      return func(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/program_translator.py", line 1339, in from_func_spec
      outputs = static_func(*inputs)
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 59, in forward
      if self.training:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 69, in forward
      for inp in inputs_list:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 162, in convert_while_loop
      _run_py_while(cond, body, getter, setter)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 231, in _run_py_while
      body()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/meta_arch.py", line 76, in forward
      outs.append(self.get_pred())
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 90, in get_pred
      if not self.export_post_process:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 92, in get_pred
      elif self.export_nms:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 93, in get_pred
      bbox_pred, bbox_num = self._forward()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 68, in _forward
      if self.training or not self.export_post_process:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/architectures/picodet.py", line 72, in _forward
      bboxes, bbox_num = self.head.post_process(
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/heads/pico_head.py", line 407, in post_process
      if not export_nms:
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 398, in convert_ifelse
      out = _run_py_ifelse(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/jit/dy2static/convert_operators.py", line 487, in _run_py_ifelse
      py_outs = true_fn() if pred else false_fn()
    File "/home/ubuntu/PaddleDetection/ppdet/modeling/heads/pico_head.py", line 411, in post_process
      scale_y, scale_x = paddle.split(scale_factor, 2, axis=-1)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/tensor/manipulation.py", line 2273, in split
      helper.append_op(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/layer_helper.py", line 44, in append_op
      return self.main_program.current_block().append_op(*args, **kwargs)
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/framework.py", line 4467, in append_op
      op = Operator(
    File "/home/ubuntu/paddle_env/lib/python3.9/site-packages/paddle/base/framework.py", line 3016, in __init__
      for frame in traceback.extract_stack():

--------------------------------------
C++ Traceback (most recent call last):
--------------------------------------
0   paddle::AnalysisPredictor::ZeroCopyRun()
1   paddle::framework::NaiveExecutor::Run()
2   paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, phi::Place const&)
3   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&) const
4   paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, phi::Place const&, paddle::framework::RuntimeContext*) const
5   paddle::framework::OperatorWithKernel::InnerGetExpectedKernelType(paddle::framework::ExecutionContext const&) const
6   paddle::operators::SplitOp::GetExpectedKernelType(paddle::framework::ExecutionContext const&) const
7   paddle::framework::OperatorWithKernel::IndicateVarDataType(paddle::framework::ExecutionContext const&, std::string const&) const
8   paddle::framework::OperatorWithKernel::ParseInputDataType(paddle::framework::Variable const*, std::string const&, paddle::framework::proto::VarType_Type*) const
9   phi::enforce::EnforceNotMet::EnforceNotMet(phi::ErrorSummary const&, char const*, int)
10  phi::enforce::GetCurrentTraceBackString[abi:cxx11](bool)

----------------------
Error Message Summary:
----------------------
InvalidArgumentError: The split Op's Input Variable `X` contains uninitialized Tensor.
  [Hint: Expected t->IsInitialized() == true, but received t->IsInitialized():0 != true:1.] (at /paddle/paddle/fluid/framework/operator.cc:2094)
  [operator < split > error]
Aborted (core dumped)

同样的命令,我使用官方的layout模型:

ls -l /opt/ml/model/layout/picodet_lcnet_x1_0_fgd_layout_cdla_infer/
total 9.7M
-rw-rw-r-- 1 1000 1000 7.0M Jul 29 06:33 inference.pdiparams
-rw-rw-r-- 1 1000 1000  44K Jul 29 06:33 inference.pdiparams.info
-rw-rw-r-- 1 1000 1000 2.7M Jul 29 06:33 inference.pdmodel

就可以成功. 看起来是模型导出的问题?但前面使用PaddleDetection中的infer.py又是能成功推理的,只是在cpp_infer这会报错。

我的PaddleDetection是release/2.7, paddle预测库是2.3.2/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.6_cudnn8.4.0-trt8.4.0.6/paddle_inference.tgz, paddleOCR是release/2.7

其他补充信息 Additional Supplementary Information

No response

SigureMo commented 4 months ago

按照PaddlePaddle/PaddleOCR@main/ppstructure/layout/README_ch.md#72-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86 指引,成功训练了版面分析模型,成功导出,并用PaddleDetection中的deploy/python/infer.py成功推理。

请问具体的导出命令是?我这边复现下看看是否是在动转静导出出现的问题

另外请问 Paddle 版本是?如果是低版本是否可以升级 3.0-beta 试试呢?

JianyuZhan commented 4 months ago

导出的指令就是上述链接中7.1节的指令:python3 tools/export_model.py \ -c configs/picodet/legacy_model/application/layout_analysis/picodet_lcnet_x1_0_layout.yml \ -o weights=output/picodet_lcnet_x1_0_layout/best_model \ --output_dir=output_inference/, paddle预测库是2.3.2/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.6_cudnn8.4.0-trt8.4.0.6/paddle_inference.tgz

Wangzheee commented 4 months ago

看起来是有个tensor没有被赋值、初始化,可以检查一下模型的输入是不是都给对了再run的

JianyuZhan commented 4 months ago

按照PaddlePaddle/PaddleOCR@main/ppstructure/layout/README_ch.md#72-%E6%A8%A1%E5%9E%8B%E6%8E%A8%E7%90%86 指引,成功训练了版面分析模型,成功导出,并用PaddleDetection中的deploy/python/infer.py成功推理。

请问具体的导出命令是?我这边复现下看看是否是在动转静导出出现的问题

另外请问 Paddle 版本是?如果是低版本是否可以升级 3.0-beta 试试呢?

另外, 我尝试下载这两个3.0-beta的paddle推理库,都报404: https://paddle-inference-lib.bj.bcebos.com/3.0.0-beta1/cxx_c/Linux/GPU/x86-64_gcc8.2_avx_mkl_cuda11.8_cudnn8.6.0-trt8.5.1.7/paddle_inference.tgz/paddle_inference.tgz

https://paddle-inference-lib.bj.bcebos.com/3.0.0-beta1/cxx_c/Linux/GPU/x86-64_gcc12.2_avx_mkl_cuda12.3_cudnn9.0.0-trt8.6.1.6/paddle_inference.tgz/paddle_inference.tgz

JianyuZhan commented 4 months ago

看起来是有个tensor没有被赋值、初始化,可以检查一下模型的输入是不是都给对了再run的

用 PaddleDection 中的 export_model.py 导出的自己训练的layout模型,然后:

用 PaddleDetection 中自己的infer.py脚本推理, 是没问题的; 用 PaddleOCR的c++版的ppocr + 官方自己开源的layout模型,也是没有问题的; 用 PaddleOCR的c++版的ppocr+ 导出的自己训练的layout模型,就会出上面的错。

所以我觉得可能不是输入的问题。

Wangzheee commented 4 months ago

看起来是有个tensor没有被赋值、初始化,可以检查一下模型的输入是不是都给对了再run的

用 PaddleDection 中的 export_model.py 导出的自己训练的layout模型,然后:

用 PaddleDetection 中自己的infer.py脚本推理, 是没问题的; 用 PaddleOCR的c++版的ppocr + 官方自己开源的layout模型,也是没有问题的; 用 PaddleOCR的c++版的ppocr+ 导出的自己训练的layout模型,就会出上面的错。

所以我觉得可能不是输入的问题。

不同的的方式导出的静态图模型,会有些区别哦,可以用netron打开看看模型的输入是否相同