not able to identify dataframe from bank statements pdf

tabula and camelot both are not able to extract tables from bank statements pdf like the one sample attached 1) the area for the table is not fixed i.e. co-ordinates are changed for every months statement 2) lattice and stream mode both not working and gives always empty dataframe with column names C:\Users\vikas\Desktop\GreenariaSociety\Tools>python sample.py <class 'pandas.core.frame.DataFrame'> Empty DataFrame Columns: [DATE, MODE, PARTICULARS, DEPOSITS, WITHDRAWALS, BALANCE] Index: [] 3) also in case some columns are having multiple lines in the values for e.g. PARTICULARS/DESCRIPTIONS from bank statements the table cell data is not correctly extracted and it is spread across other cells/rows

sample code used as below:- df = tabula.read_pdf(pdf_path, pages="1",stream=True,multiple_tables=True)[0] #//tried lattice, pages='all', etc. print(type(df)) print(df)

sample_bank_statement.pdf

camelot-dev / camelot

not able to identify dataframe from bank statements pdf #357