PaddlePaddle / PaddleNLP

πŸ‘‘ Easy-to-use and powerful NLP and LLM library with πŸ€— Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πŸ—‚Text Classification, πŸ” Neural Search, ❓ Question Answering, ℹ️ Information Extraction, πŸ“„ Document Intelligence, πŸ’Œ Sentiment Analysis etc.
https://paddlenlp.readthedocs.io
Apache License 2.0
11.71k stars 2.86k forks source link

[Question]: Inferencing using finetuned ERNIE-Layout model on Taskflow #8661

Open edkair opened 3 days ago

edkair commented 3 days ago

I have finetuned an ERNIE-Layout model on an mrc task, and exported the model. It looks like I can inference using the infer.py / predictor.py script, but would greatly prefer to use the Taskflow interface (Taskflow(task="document_intelligence")), so I can do a simple model swap between the pretrained and finetuned models.

I replaced the pretrained model files in .paddlenlp/taskflow/document_intelligence/docprompt/static and .paddlenlp/models/ernie-layoutx-base-uncased with my exported finetuned model files (same filenames), but get the following error afterwards:

Traceback (most recent call last):
  File "/code/predict_func.py", line 42, in extract_info
    raw_output = docprompt([{"doc": img_path, "prompt": list(prompt_strs)}])
  File "/opt/miniconda/lib/python3.10/site-packages/paddlenlp/taskflow/taskflow.py", line 822, in __call__
    results = self.task_instance(inputs, **kwargs)
  File "/opt/miniconda/lib/python3.10/site-packages/paddlenlp/taskflow/task.py", line 527, in __call__
    outputs = self._run_model(inputs, **kwargs)
  File "/opt/miniconda/lib/python3.10/site-packages/paddlenlp/taskflow/document_intelligence.py", line 122, in _run_model
    self.predictor.run()
ValueError: (InvalidArgument) The 'shape' in ReshapeOp is invalid. The input tensor X'size must be equal to the capacity of 'shape'. But received X's shape = [7, 7, 4], X's size = 196, 'shape' is [49, 1], the capacity of 'shape' is 49.
  [Hint: Expected capacity == in_size, but received capacity:49 != in_size:196.] (at ../paddle/phi/infermeta/unary.cc:1781)
  [operator < reshape2 > error]

Any ideas?

Environment:

Python 3.10
paddlepaddle-gpu==2.5.2
paddleocr==2.7.3
paddlenlp==2.8.0