PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
Apache License 2.0
39.34k stars 7.35k forks source link

使用PPOCR 文档分析实战-表格识别项目时,运行 代码报错 #10387

Closed ld520 closed 2 weeks ago

ld520 commented 10 months ago

使用文档分析实战-表格识别项目时,运行 代码报错 代码:

import cv2 from table.predict_table import TableSystem,to_excel from utility import init_args

初始化参数

args = init_args().parse_args(args=[]) args.det_model_dir='inference/ch_PP-OCRv2_det_infer' args.rec_model_dir='inference/ch_PP-OCRv2_rec_infer' args.table_model_dir='inference/en_ppocr_mobile_v2.0_table_structure_infer' args.image_dir='/home/aistudio/1.jpg' args.rec_char_dict_path='../ppocr/utils/ppocr_keys_v1.txt' args.table_char_dict_path='../ppocr/utils/dict/table_structure_dict.txt' args.det_limit_side_len=736 args.det_limit_type='min' args.output='../output/table' args.use_gpu=False

初始化表格识别系统

table_sys = TableSystem(args) img = cv2.imread('/home/aistudio/1.jpg')

执行表格识别

pred_html = table_sys(img)

结果存储到excel文件

to_excel(pred_html,'1.xlsx') print(pred_html)

报错: ddle120-env/lib/python3.7/site-packages/matplotlib/pyplot.py", line 533, in figure **kwargs) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/backend_bases.py", line 161, in new_figure_manager return cls.new_figure_manager_given_figure(num, fig) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/backends/_backend_tk.py", line 1046, in new_figure_manager_given_figure window = Tk.Tk(className="matplotlib") File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/tkinter/init.py", line 2023, in init self.tk = _tkinter.create(screenName, baseName, className, interactive, wantobjects, useTk, sync, use) _tkinter.TclError: no display name and no $DISPLAY environment variable aistudio@jupyter-4490434-6530013:~/work/PaddleOCR-release-2.6/ppstructure$ python -m test [2023/07/14 09:15:46] ppocr DEBUG: dt_boxes num : 69, elapse : 0.8896045684814453

[2023/07/14 09:15:52] ppocr DEBUG: rec_res num : 69, elapse : 5.328573226928711 Traceback (most recent call last): File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/aistudio/work/PaddleOCR-release-2.6/ppstructure/test.py", line 24, in to_excel(pred_html,'1.xlsx') File "/home/aistudio/work/PaddleOCR-release-2.6/ppstructure/table/predict_table.py", line 145, in to_excel tablepyxl.document_to_xl(html_table, excel_path) File "/home/aistudio/work/PaddleOCR-release-2.6/ppstructure/table/tablepyxl/tablepyxl.py", line 101, in document_to_xl wb = document_to_workbook(doc, base_url=base_url) File "/home/aistudio/work/PaddleOCR-release-2.6/ppstructure/table/tablepyxl/tablepyxl.py", line 87, in document_to_workbook inline_styles_doc = Premailer(doc, base_url=base_url, remove_classes=False).transform() File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/premailer/premailer.py", line 319, in transform stripped = html.strip() AttributeError: 'tuple' object has no attribute 'strip'

ld520 commented 10 months ago

系统环境/System Environment:win10 22H2 / Python3.9 版本号/Version:Paddle:2.4.2 PaddleOCR:2.6 问题相关组件/Related components:ppstructure 运行指令/Command Code:python -m test 完整报错/Complete Error Message:如上原文

livingbody commented 10 months ago

aistudio上建个项目试试,不清楚你的情况不好复现。

ld520 commented 10 months ago

就是官方的动手学OCR·十讲 中的文档分析实战-表格识别 项目

ld520 commented 10 months ago

将html分割并合成excel这一步错误,我试图单独print pred_html 但是没有识别出图像

LiuC425 commented 7 months ago

您解决了吗?我也遇到了相同的问题,我尝试追溯,感觉是predict_table中的self.match出了问题

UserWangZz commented 2 weeks ago

该issue长时间未更新,暂将此issue关闭,如有需要可重新开启。