Layout-Parser / layout-parser

A Unified Toolkit for Deep Learning Based Document Image Analysis
https://layout-parser.github.io/
Apache License 2.0
4.75k stars 456 forks source link

[feat] Add pdf loader #71

Closed lolipopshock closed 2 years ago

lolipopshock commented 2 years ago

Add support for loading pdf files in layoutparser

>>> import layoutparser as lp
>>> pdf_layout = lp.load_pdf("path/to/pdf")
>>> pdf_layout[0] # the layout for page 0
>>> pdf_layout, pdf_images = lp.load_pdf("path/to/pdf", load_images=True)
>>> lp.draw_box(pdf_images[0], pdf_layout[0])
maswiebe commented 1 year ago

Can I use this to extract a table from a PDF into a dataframe?