atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.61k stars 349 forks source link

Not being able to detect the table on Page 5 #436

Open RituVirk11 opened 3 years ago

RituVirk11 commented 3 years ago

Hi,

I am using the following code to get all the text from the PDF attached in form of csv,

Code:

tables = camelot.read_pdf('2009_AFL.pdf', pages='1-end', flavor='stream') tables.export('loo.csv', f='csv', compress=True)

however it is not detecting certain portion of page 5 in the attached document. Specifically it is not detecting the following portion of page 5 (see the image attached). This is also occurring for some other pages. My guess is where the pages have a line in between them, camelot stops detecting the text after the line and skips over to the next page

2009_AFL.pdf

Page 5 - portion not detected

Please advise,