atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 357 forks source link

How to get table coordinates (regardless of page orientation) #334

Closed anakin87 closed 5 years ago

anakin87 commented 5 years ago

Hi, @vinayak-mehta and everybody, thank you for your effort in this awesome library.

For some reasons, after the identification of the tables, i need to generate a PDF containing only the table. So, I need table coordinates.

The problem is that, when the table is rotated (vertical page, horizontal table), bbox is referred to rotated page. 20180307_10_05_000164_20180226_SO000003.pdf see page 7

There are two alternative ways to solve this problem:

What do you think about this problem? How to solve it?

anakin87 commented 5 years ago

I resolved recovering temporary pdf files (monkey patching camelot.utils.TemporaryDirectory) and inspecting rotation by pdfbox. By the way, maybe it can be useful to return page rotation along with other tables parameters...

vinayak-mehta commented 5 years ago

I'll check it out.

vinayak-mehta commented 5 years ago

Sorry for the delay in replies, please give me some time to look into this.

vinayak-mehta commented 5 years ago

Very specific use-case, closing for now.