add __iter__() for TableList to support enumerate()

stonyw commented 5 months ago

    tables = camelot.read_pdf(filename)
    for idx, table in enumerate(tables):   # Warining: Expected type 'Iterable[_T]', got 'TableList' instead  in pycharm
        pass

henrywman101 commented 5 months ago

comparison between table_areas and table_regions (with flavor='stream') table_areas recognize tables more accurate

When using Camelot's camelot.read_pdf function with table_areas and table_regions parameters, you're specifying the exact areas or regions of the page where you expect the tables to be. This is particularly useful for PDFs where tables are not well-detected using the default settings.

-

table_areas: This parameter expects a list of strings, where each string defines the coordinates of a rectangular area that contains a table. The format of the coordinates is "x1,y1,x2,y2" (in PDF points), where (x1, y1) is the top-left corner of the rectangle and (x2, y2) is the bottom-right corner.

table_regions: This parameter is used to specify regions where tables are expected. It's similar to table_areas but less precise. It's useful when you have multiple tables in a region.

Here's an example of how to use these parameters: [image: Screenshot 2024-02-01 at 02.39.30.png] [image: Screenshot 2024-02-01 at 04.13.45.png] [image: Screenshot 2024-02-01 at 02.23.55.png]

Message ID: @.***>

$ camelot stream -plot contour 13pg.pdf

MartinThoma commented 4 months ago

Hey!

As camelot is dead, we try to build a maintained fork at pypdf_table_extraction.

Do you want to open the PR against that branch so that we can merge your improvement?

camelot-dev / camelot

add iter() for TableList to support enumerate() #486

camelot-dev / camelot

add __iter__() for TableList to support enumerate() #486

add iter() for TableList to support enumerate() #486