atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.61k stars 349 forks source link

Returning more tables than expected using table_areas #368

Closed CartierPierre closed 4 years ago

CartierPierre commented 4 years ago

Hi,

Cameot (read_pdf) is returning more than 1 table per table_area. It's look like we try to automaticaly find a new table or something similar.

vinayak-mehta commented 4 years ago

Can you post the PDF and code snippet?

CartierPierre commented 4 years ago

I can't because my datas are not mine, it's from customers. But for exemple if I have one page, one table, I pass the table_area and read_pdf returns me 2 tables, one correct, and an other one very bad, bad columns, datas from outsite of the area, not usable. So I have to filter the tables.

vinayak-mehta commented 4 years ago

It's hard to debug this without actually looking at the PDF.