atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

Specify Table Areas AND Columns with Stream command #261

Closed pablobarria closed 5 years ago

pablobarria commented 5 years ago

With the command line interface I can do the following: camelot -f json -o "file.json" stream -T 0,560,608,40 -r 10 -C 0,45,75,100,290,445 "file.pdf"

Yet when I try to pass both table_areas and columns kwargs to camelot.read_pdf with the stream flavor it fails with the error:

ValueError: Length of table_areas and columns should be equal

No matter the amount of columns I input.

Is this a bug or am I doing something wrong? I tried to do it with table_regions as well, but that seems to not work with Stream at all (haven't tried with Lattice).

I'm using Python 3.6.5 on MacOS Mojave

anakin87 commented 5 years ago

tables=camelot.read_pdf('file.pdf',flavor='stream',table_areas=["0,560,608,40"],columns=["0,45,75,100,290,445"])

With this syntax, it works. Both table_areas and columns must be defined as lists of strings, with the same length.

pablobarria commented 5 years ago

...the one thing I didn't try. Thanks!

vinayak-mehta commented 5 years ago

@anakin87 Thanks for pointing it out!