atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.67k stars 360 forks source link

How to locate at the table desired? #388

Open dongrixinyu opened 5 years ago

dongrixinyu commented 5 years ago

the tool is excellent in extracting tables from pdf! i love it. but i can extract many tables from one pdf and i just only need one table i want located at page n. How could i locate?

does this tool provide a function for search key information and locate designated page i want?

anakin87 commented 4 years ago

Do you want to extract tables from one specific page?

Sorry, but I don't understand your question...

dongrixinyu commented 4 years ago

i mean , there are many tables in a pdf file. the search method for the table i want is selecting a page number. i wish this tool to provide a regular expression parameter that matches the text in page N, then i get the table in page N.

anakin87 commented 4 years ago

At the moment, Camelot doesn't provide this feature. You can find the desired text and the page using a PDF parsing library (for example, https://github.com/izderadicka/pdfparser). Then, extract tables from the selected page, using camelot.read_pdf('your.pdf', pages='1,2,3')