camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
3.04k stars 474 forks source link

IndexError: list index out of range #258

Open smathai02 opened 3 years ago

smathai02 commented 3 years ago

I am trying to read a folder of pdfs and extracting data from the pdfs

Code is as follows: for root, dirs, files in os.walk(".", topdown=False): for file in files: filename, extension = os.path.splitext(file) if extension == '.pdf': print(os.path.join(root, file)) tables = camelot.read_pdf(File)
df=tables[0].df # get a pandas df.columns=(df.iloc[0]) df.drop(index=0, axis=0,inplace=True) print(df.describe().T)

I get an error after reading first file: IndexError Traceback (most recent call last)

in 5 # print(os.path.join(root, file)) 6 tables = camelot.read_pdf("2019 - Property Renewal.pdf") ----> 7 df=tables[0].df # get a pandas 8 df.columns=(df.iloc[0]) 9 df.drop(index=0, axis=0,inplace=True) C:\ProgramData\Anaconda3\lib\site-packages\camelot\core.py in __getitem__(self, idx) 672 673 def __getitem__(self, idx): --> 674 return self._tables[idx] 675 676 @staticmethod IndexError: list index out of range
nikhil-R-A commented 3 years ago

This means that the table wasn't read

smathai02 commented 3 years ago

This means that the table wasn't read Thanks, The first pdf loads properly but the error occurs when I try to read the second pdf. Do I need to reinitialize the camelot object?

ashissahu commented 2 years ago

@smathai02 , did the issue resolve for you ? i am facing similar issue ? could you please help me ?