atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.64k stars 354 forks source link

Sort TableList by order of tables in PDF #277

Closed symroe closed 5 years ago

symroe commented 5 years ago

I would expect the TableList returned by read_pdf to return a list of tables in the order defined in the PDF (by page, then table on that page).

I think this requires a few things:

  1. Create a __lt__ method on Table that looks like:

    def __lt__(self, other):
    return self.page < other.page and self.order < other.order
  2. Create an __iter__ method on TableList

  3. return sorted(tables) in read_pdf

I'd be happy to make these changes, but wonder if there are reasons against doing this, before making a PR.

vinayak-mehta commented 5 years ago

I'd be happy to make these changes, but wonder if there are reasons against doing this, before making a PR.

There aren't any reasons against doing this, it just wasn't a priority. You can go ahead and open a PR! Do check out the contributor's guide before opening one.