ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.46k stars 356 forks source link

How to save highlight in table after convert pdf to docx #281

Open Herrifly opened 4 months ago

Herrifly commented 4 months ago

I color the lines in pdf for the entire size of the sheet, everything is colored in pdf format. I need to convert to docx format, but when converting, the row selection turns into a table separator and a flat line. This happens exactly if we have a table in pdf, how do I save the coloring of the rows. If it is the cell that is being colored, then everything is fine, but I need exactly the whole row. Thank's for the answer!

JorjMcKie commented 4 months ago

Please provide an example page that goes wrong.

JorjMcKie commented 4 months ago

Not just an image - but a PDF page!

Herrifly commented 4 months ago

In the message above, I sent an example where conversion does not work, could you help me how to overcome this?

Herrifly commented 3 months ago

Perhaps some particular type of allocation is not suitable and a standard is needed, or is there another way to transfer tables from pdf to docx? If you suddenly manage to find a solution, it would be great.