atlanhq / camelot

Camelot: PDF Table Extraction for Humans
https://camelot-py.readthedocs.io
Other
3.65k stars 355 forks source link

how to map table coordinates from some other framework to camelot coordinates #434

Open peeyushpashine opened 4 years ago

peeyushpashine commented 4 years ago

I have used maskrcnn to detect tables from pdfs and have table coordinates. When I pass these coordinates to camelot as a parameter table_area and flavor as stream, tables are not detected. using camelot plot and zoom feature i see table coordinates are different from the ones that I have. How do i know the coordinates system and dpi or image size used by camelot to detect tables so that i can convert my table coordinates to camelot ones and use those to extract table contents?

I have used different DPIs for my image conversion and got different coordinates but they don't match with camelot table coordinates. @vinayak-mehta

Anshul-GH commented 3 years ago

I had a similar issue and had to rescale the coordinates before I could use it with camelot. Basically:

  1. Took the coordinates of both the table and the entire pdf page and then created a scaling factor
  2. Also re-calibrate the coordinate space as pdf coordinates are defined with bottom-left as origin (0, 0) while the source tool that I was using was using top-left as origin.

It worked for my case. You can give it a shot.

SAIVENKATARAJU commented 2 years ago

@peeyushpashine Did you able to figured out ? I have also similar issue now.