atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.61k stars 349 forks source link

Error min() arg is an empty sequence when giving a table_area to camelot #452

Open GraceBouala opened 3 years ago

GraceBouala commented 3 years ago

Hello, I am trying to extract a table from a pdf file with camelot using the stream flavor and a specific table area. First, I used camelot read_pdf method with lattice flavor to get the table bounding box. Once this is done I call read_pdf again with stream flavor and the table_areas that I get from the first read_pdf call. However, I am getting a 'min() arg is an empty sequence ' error while there is indeed a table in that area and lattice is even extracting that table. Can someone help me fix that issue? Bellow is my code

import camelot tables = camelot.read_pdf(pdf,pages='2') bbox = list(tables[0].dict['_bbox']) bbox = [str(elt) for elt in bbox] interested_area = ','.join(bbox) output_camelot = camelot.read_pdf( pdf, pages='2', flavor='stream', split_text=True, table_areas = [interested_area] )

AnandKumar1989 commented 2 years ago

Hi GraceBouala, Please verify pdf table, it might be a scanned/image table.

Regards, Anand