atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

camelot can not extract tables from annual report #300

Closed LinHongyang closed 5 years ago

LinHongyang commented 5 years ago

Hi! I tried to extract tables for some annual report files that are posted online, but Camelot cannot get any table from them. The result is like this:

tables = camelot.read_pdf('Dell.pdf', pages = '3,4,5,6,7') tables

tables = camelot.read_pdf('FB_AR_2017_FINAL.pdf', pages = '34') tables

These files do have tables on the corresponding pages, but cannot be detected. I'd like to know the reason.

Here are the links of the file I tried to work with: chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://i.dell.com/sites/doccontent/corporate/secure/en/Documents/DellInc_10-Q_2FY2014.pdf

chrome-extension://oemmndcbldboiebfnladdacbdfmadadm/https://s21.q4cdn.com/399680738/files/doc_financials/annual_reports/FB_AR_2017_FINAL.pdf

anakin87 commented 5 years ago

Try using flavor='stream'.

Please read the docs: https://camelot-py.readthedocs.io/en/master/user/advanced.html