atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.61k stars 349 forks source link

Created feature paramter multi #363

Closed sverma25 closed 4 years ago

sverma25 commented 4 years ago

Can be used for setting different parameters for different pages in a document. For example, if wanting to supply table_region for a document, multi can be used to submit different regions for different areas. Common parameters that are to be applied for all pages can still be used.

Multi is used a dictionary to supply additional parameters (Page: Parameters (in dict)). Added parameters override the global arguments for that page

For example,

filename = os.path.join(testdir, "multi_params.pdf")
tables = camelot.read_pdf(filename, pages="all",  
    multi={'2': {"table_regions": ["120, 210, 400, 90"]}}, 
    split_text=True)

In this example, page 1 will use split_text as its parameters, and page 2 will use table regions AND split text as its parameters.

vinayak-mehta commented 4 years ago

We should come up with a better way to do this. Can you raise the PR over at camelot-dev/camelot so that Travis runs on it?