Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.78k stars 459 forks source link

Retrieving text inside layouts. #30

Open phuynhh opened 3 years ago

phuynhh commented 3 years ago

Hi, thank you very much for your brilliant work. I have successfully installed and run the parserlayout package on my win10. However, as I come from a non-computing/ data science background, I've currently been stuck on how to retrieve the text inside layouts and restore them into a dataframe for further analysis. Would you be able to provide any keywords or links about how to do the tasks? Any word will be very much appreciated. Thank you a lot.

lotfiabdelghafour commented 3 years ago

https://github.com/Layout-Parser/layout-parser/blob/master/examples/Deep%20Layout%20Parsing.ipynb

lolipopshock commented 3 years ago

Sure no problem! And thanks @lotfiabdelghafour for the pointer. I assume you are dealing with some image scans, so you might want to perform OCR after running the layout detection models. In this example, https://github.com/Layout-Parser/layout-parser/blob/master/examples/Deep%20Layout%20Parsing.ipynb you might find the some OCR examples at the very end. You could try with that first and and see it that's helpful.

Speaking of exporting, in the v0.2 release, we've just built the function to export a layout to dataframe: layout.to_dataframe(), see detail here.