Open 564142183 opened 2 weeks ago
import paddlenlp, paddleocr
print("paddlenlp:"+paddlenlp.__version__)
print("paddleocr:"+paddleocr.__version__)
from pprint import pprint
from paddlenlp import Taskflow
schema = ["开票金额是多少?", "销方开户银行是什么?", "发票号码是什么?", "开票日期是哪天?"]
docprompt = Taskflow("document_intelligence")
pprint(docprompt([{"doc": "./2.pdf", "prompt": schema}]))
λ 969010514d8d /PaddleNLP/test python app.py
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
paddlenlp:2.8.0.post
paddleocr:2.6.1.3
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[2024-06-18 02:58:35,695] [ INFO] - We are using (<class 'paddlenlp.transformers.ernie_layout.tokenizer.ErnieLayoutTokenizer'>, False) to load 'ernie-layoutx-base-uncased'.
[2024-06-18 02:58:36,221] [ INFO] - tokenizer config file saved in /root/.paddlenlp/models/ernie-layoutx-base-uncased/tokenizer_config.json
[2024-06-18 02:58:36,221] [ INFO] - Special tokens file saved in /root/.paddlenlp/models/ernie-layoutx-base-uncased/special_tokens_map.json
Traceback (most recent call last):
File "/PaddleNLP/test/app.py", line 10, in <module>
pprint(docprompt([{"doc": "./2.pdf", "prompt": schema}]))
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/taskflow.py", line 822, in __call__
results = self.task_instance(inputs, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/task.py", line 526, in __call__
inputs = self._preprocess(*args)
File "/usr/local/lib/python3.10/dist-packages/paddlenlp/taskflow/document_intelligence.py", line 90, in _preprocess
ocr_result = self._ocr.ocr(example["doc"], cls=True)
File "/usr/local/lib/python3.10/dist-packages/paddleocr/paddleocr.py", line 544, in ocr
img = check_img(img)
File "/usr/local/lib/python3.10/dist-packages/paddleocr/paddleocr.py", line 434, in check_img
img, flag_gif, flag_pdf = check_and_read(image_file)
File "/usr/local/lib/python3.10/dist-packages/paddleocr/ppocr/utils/utility.py", line 96, in check_and_read
for pg in range(0, pdf.pageCount):
AttributeError: 'Document' object has no attribute 'pageCount'. Did you mean: 'page_count'?
软件环境
重复问题
错误描述
稳定复现步骤 & 代码