atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.62k stars 350 forks source link

Not able to extract table #329

Closed Jackblackpearl closed 4 years ago

Jackblackpearl commented 5 years ago

Hello,

Try to extract data from below file but getting an error message. Please find the code below

import camelot import matplotlib.pyplot as plt path = "D:\localhome\kk\Desktop\bestellung_ID_974880.pdf" tables = camelot.read_pdf(path)#, table_areas=['41,448,328,438'], pages = '1', flavor = 'stream'

print(tables[0].parsing_report)

camelot.plot(tables[0]) plt.show() tables.export("D:\localhome\kk\Desktop\demo01.csv", f='csv')

Error Message:

File "", line 1, in runfile('D:/Karthick/Python_Scripts/Basic_Camelot.py', wdir='D:/Karthick/Python_Scripts')

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile execfile(filename, namespace)

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "D:/Karthick/Python_Scripts/Basic_Camelot.py", line 11, in tables = camelot.read_pdf(path)#, table_areas=['41,448,328,438'], pages = '1', flavor = 'stream'

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\camelot\io.py", line 106, in read_pdf layout_kwargs=layout_kwargs, **kwargs)

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\camelot\handlers.py", line 156, in parse self._save_page(self.filepath, p, tempdir)

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\camelot\handlers.py", line 104, in _save_page p = infile.getPage(page - 1)

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\pdf.py", line 1176, in getPage self._flatten()

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\pdf.py", line 1505, in _flatten catalog = self.trailer["/Root"].getObject()

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\generic.py", line 516, in getitem return dict.getitem(self, key).getObject()

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\generic.py", line 178, in getObject return self.pdf.getObject(self).getObject()

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\pdf.py", line 1599, in getObject idnum, generation = self.readObjectHeader(self.stream)

File "C:\Users\kk\AppData\Local\Continuum\anaconda3\envs\Camelot_Py\lib\site-packages\PyPDF2\pdf.py", line 1667, in readObjectHeader return int(idnum), int(generation)

ValueError: invalid literal for int() with base 10: b'dobj''

bestellung_ID_974880.pdf

Jackblackpearl commented 5 years ago

Hi vinayak any update on this issue

DhwaniGurjar commented 5 years ago

Were you able to solve it? I am getting the same issue.

vinayak-mehta commented 5 years ago

Sorry for the delay in replies, please give me some time to look into this.

Jackblackpearl commented 5 years ago

Any update on this issue vinayak

vinayak-mehta commented 5 years ago

I'll need more time to look into this.

Jackblackpearl commented 5 years ago

Any update on this

Nusran commented 4 years ago

image

How to resolve this

Nusran commented 4 years ago

I need to get a count of merge cells in the table How can I do this?

image in this image, there are 6 cells but when I print the table I can found 9 cells. Merge cells are not detecting