Closed anakin87 closed 5 years ago
Hi @anakin87! You can specify table areas in read_pdf
using the table_areas
kwarg. For more information on usage, check out the docs. Please comment if you face any problems.
If I provide table_areas, Camelot interprets them as specific table coordinates.
My problem is that I want to search for tables in a specific area of the page, but I don't know specific table coordinates. How to cope with this problem?
I get the issue now. Camelot treats the passed table areas as actual boundaries of the table. This can be an enhancement where the user can pass a table_region
so that camelot only processes the text and lines inside the region to form a table. Reopening this.
@anakin87 Can you post a link to that PDF?
I would want to search for tables in a certain region of the page, in the order to extract only true tables and not tables that are elements of layout.
@anakin87 Thanks for reporting this issue, the current table_areas
kwarg for Lattice hardcodes the coordinates of the table boundary leading to unwanted text with the extracted table and making the user note the exact coordinates while debugging visually. Which should not be the case, table_areas
should just guide camelot to analyze only that part of the page to find tables using Lattice and Stream.
This is a behavioral bug, I'll push a fix today.
I think both the options are useful:
Hmm, I guess keeping them separate makes sense since a table region could contain two or more table areas too.
@anakin87 Check out the docs for usage details.
Great!!!
How do you get the coordinates to be passed as argument to table_areas
?
I'm trying to automatically detect and extract tables encapsulated in other tables.
I would want to make camelot search in certain area: this is not table area but the area where the table resides (see the attached image).
How I can make Camelot work in this way? Ideas for the develop are well-accepted...