Currently for getting tables i'm using this part of code for getting tables as excel file.(2.2.4 table recognition)

import os import cv2 import PIL import paddleclas import paddle from paddleocr import PPStructure,draw_structure_result,save_structure_res

table_engine = PPStructure(layout=False, show_log=True) # table recognition

save_folder = 'output' img_path = 'example.png' img = cv2.imread(img_path) result = table_engine(img) save_structure_res(result, save_folder, os.path.basename(img_path).split('.')[0])

And for getting text from pdf file i'm using this code below with this tables are also converting into text which i don't want.

from paddleocr import PaddleOCR, draw_ocr

Paddleocr supports Chinese, English, French, German, Korean and Japanese.

You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan` to switch the language model in order.

ocr = PaddleOCR(use_angle_cls=True, lang="en", page_num=0) # need to run only once to download and load model into memory img_path = 'tables/example.pdf' result = ocr.ocr(img_path, cls=True)

def ocr_to_txt(result): text= "" for line in result: for word in line: text += word[1][0] + " " text += "\n" return text text = ocr_to_txt(result)

with open ("ocr_results.txt", "w") as f: f.write(text)

PaddlePaddle / PaddleOCR

can i get tables and text from pdf separately with PaddleOCR ? #11959

Currently for getting tables i'm using this part of code for getting tables as excel file.(2.2.4 table recognition)

And for getting text from pdf file i'm using this code below with this tables are also converting into text which i don't want.

Paddleocr supports Chinese, English, French, German, Korean and Japanese.

You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan` to switch the language model in order.

PaddlePaddle / PaddleOCR

can i get tables and text from pdf separately with PaddleOCR ? #11959

Currently for getting tables i'm using this part of code for getting tables as excel file.(2.2.4 table recognition)

And for getting text from pdf file i'm using this code below with this tables are also converting into text which i don't want.

Paddleocr supports Chinese, English, French, German, Korean and Japanese.

You can set the parameter lang as ch, en, fr, german, korean, japan to switch the language model in order.

You can set the parameter `lang` as `ch`, `en`, `fr`, `german`, `korean`, `japan` to switch the language model in order.