Open jainamshah535 opened 1 year ago
did you find a workaround? I am also trying to extract tables and remaining text using camelot, but no success. @jainamshah535
^^ Echoing the same here, it seems like table_area= argument is not limiting scope of Camelot to specified area
path1 = r"C:\Users\Downloads\PDF Extraction Project\Compensation_document.pdf" tables = camelot.read_pdf(path1, flavor='stream', pages='all')
print("Total tables extracted:", tables.n)
writer = pd.ExcelWriter(r"c:\temp\Compensation_document.xlsx") i = 1 for i in range(tables.n): print("-----------------------------------------",i) df2 = pd.DataFrame() sname = "Sheet" + str(i+1) df2 = tables[i].df df2.to_excel(writer,sheet_name = sname,index = False ) print(tables[i].df) writer.save()
For this Code I tried putting option sas edge_tool=500, row_tool=10, col_tool=10 etc. But still text paragraphs are detected as tables. Also how to automatically specify area regions for tables in pdf. and get coordinates automatically for code.