atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.62k stars 350 forks source link

Few columns are missing while to extracting data from the table #324

Closed Jackblackpearl closed 4 years ago

Jackblackpearl commented 5 years ago

Please find my code and PDF samples we are using to extract data from the table. For the 1st and 2nd PDF few columns are missing and at same time for the 3rd PDF we are able to extract complete detail from the pdf.

Code: import camelot import matplotlib.pyplot as plt path = "D:\localhome\kav\Desktop\WEDPO046799.pdf" tables = camelot.read_pdf(path, pages = '1-end', table_areas=['12,585,814,51'], flavor = 'stream') print (tables[0].parsing_report) tables.export("D:\localhome\kav\Desktop\WEDPO046799.csv", f='csv')

PDF Files: WEDPO047922.pdf WEDPO046799.pdf WED PO 047931.pdf

Please let me know is there any prob with PDF or anything needs to modify with the code.

Jackblackpearl commented 5 years ago

Hi vinayak any update on this issue

vinayak-mehta commented 5 years ago

Sorry for the delay in replies, please give me some time to look into this.

Jackblackpearl commented 5 years ago

Hi Vinayak,

Any update on this issue.

vinayak-mehta commented 5 years ago

I'll need more time to look into this.