atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

Not able to detect tables to extract text from it.. #275

Closed Jackblackpearl closed 5 years ago

Jackblackpearl commented 5 years ago

Can you please do let me know how to extract details from attached pdf. I'm not able extract text from the table.

vinayak-mehta commented 5 years ago

Hello @Jackblackpearl. Can you post the code that you used and the exact problem that you're facing?

Jackblackpearl commented 5 years ago

import camelot import matplotlib.pyplot as plt tables = camelot.read_pdf("D:\Downloads\18508126_1.pdf",flavor='stream') tables (D:\Downloads\18508126_1.pdf") plt.show() tables[0].to_csv('foo.csv')

Output: Only the top right corner of the page I received it. Entrie table I didnt receive it

Jackblackpearl commented 5 years ago

received only this information didnt get rest of the information

Jackblackpearl commented 5 years ago

Vinayak any update on the my pdf???

vinayak-mehta commented 5 years ago

@Jackblackpearl Sorry for the late reply. You can try specifying table_areas and columns to get the table out. The text is very sparse and the table isn't being detected because of that I think.

nachiket8188 commented 5 years ago

Where can I find the "attached pdf" mentioned in issue above ? I am facing a similar issue but due to data sensitivity issue, I'm not able to upload the file I'm using it on.