camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.81k stars 449 forks source link

Camelot image co-ordinates to PDF box #377

Open Siddharth1India opened 1 year ago

Siddharth1India commented 1 year ago

I am working on this extremely complex problem where almost everything is variable. My final solution is, converting pdf to image, getting table coordinates from image with table-transformers and using those coordinates to get table from PDF and read PDF table with camelot.
Now issue is, I am getting box coordinates from image but they are not same as PDF coordinates and hence I cannot box-out table. Note: I am using pdf2image for converting PDF to image. Thanks

vaibhavkansallumiq commented 1 year ago

did you find solution ??

Siddharth1India commented 1 year ago

@vaibhavkansallumiq this may help: https://github.com/atlanhq/camelot/issues/486