camelot-dev / camelot

A Python library to extract tabular data from PDFs
https://camelot-py.readthedocs.io
MIT License
2.9k stars 461 forks source link

hey, how to get the coordinates joints in the table #204

Open liu-guo-jing opened 3 years ago

liu-guo-jing commented 3 years ago

Describe the bug A clear and concise description of what the bug is.

Steps to reproduce the bug Steps used to install camelot:

  1. Add step here (you can add more steps too)

Steps to reproduce the behavior:

  1. Add step here (you can add more steps too)

Expected behavior A clear and concise description of what you expected to happen.

Code Add the Camelot code snippet that you used.

import camelot

# add your code here

PDF Add the PDF file that you want to extract tables from.

Screenshots If applicable, add screenshots to help explain your problem.

Environment

Additional context Add any other context about the problem here.

anakin87 commented 3 years ago

I don't understand the issue.

If you want table coordinates, you can use table._bbox

liu-guo-jing commented 3 years ago

I want to get the coordinates of the four joints in each cell of the table. As far as I know, they can be applied in table_areas=[x1,y1,x2,y2]

anakin87 commented 3 years ago

Did you try my suggestion?

NurielWainstein commented 1 year ago

hi, have anyone managed to do this? I need the coordinates of each cell in the table, similar to the camelot.plot(joint) but instead of plotting them, i need them in [x1,y1,x2,y2] for every cell in the table.

btakeya commented 1 year ago

@NurielWainstein hey, do you still need to know about this? hope code snippet below would help.

tables = camelot.read_pdf("sample.pdf", "1")
table = tables[0]
for x in table[0].cells:
  for y in x:
    plt.plot(j.x1, j.y1, "ro")
    plt.plot(j.x2, j.y2, "ro")
    plt.plot(j.x1, j.y2, "ro")
    plt.plot(j.x2, j.y1, "ro")

plt.show()

and this code shows to me as below:

image

origin table from pdf (written in korean, but wouldn't matter I think): image