PaddlePaddle / PaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
https://paddlepaddle.github.io/PaddleOCR/
Apache License 2.0
44.55k stars 7.85k forks source link

[Question] about Table detection #9509

Open phamkhactu opened 1 year ago

phamkhactu commented 1 year ago

Thanks for excellence repo.

I have read the tutorial for table recognition, but in tutorial only provide the table recognition. I know if want to recognize table(OCR), you must detect region of table. I can not configs to get only bounding box(top-left, bottom-right) for table. image I am very happy for your help to get bounding box the table. Thanks in advance.

johntoma commented 1 year ago

You need to use PPStructure Layout Analysis to detect tables.

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/quickstart_en.md

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/models_list_en.md

phamkhactu commented 1 year ago

You need to use PPStructure Layout Analysis to detect tables.

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/quickstart_en.md

https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/ppstructure/docs/models_list_en.md

Thanks for your pointing out, I will try it.

vani-mcm commented 1 year ago

Hii.. I have a problem regarding the output of layout analysis. Even if I cropped my roi(table area), the layout analysis output is showing it as figure, not as table. What could be the reason? Any idea..

tranthuhoai3786 commented 6 months ago

@vani-mcm i get the same issue, do you have any solutions?

phamkhactu commented 5 months ago

Hii.. I have a problem regarding the output of layout analysis. Even if I cropped my roi(table area), the layout analysis output is showing it as figure, not as table. What could be the reason? Any idea..

@vani-mcm cc @johntoma Yes, I have the same problem in some casese. However, I checked again, it gave me Bounding box for table not good. It means that roi not concentrate on table(IOU), almost it gave [0,0 , w,h] for bbox

wwolfyy commented 3 months ago

@vani-mcm, @tranthuhoai3786 , the layout models only detect the areas where tables are. To parse the tables you'd need to look at: https://github.com/PaddlePaddle/PaddleOCR/blob/main/ppstructure/table/README.md