cognidox / OfficeToPDF

A command line tool to convert Microsoft Office documents to PDFs
https://www.cognidox.com/
Other
610 stars 137 forks source link

Tables become malformed in specific scenarios #27

Closed chickenoodlestu closed 6 years ago

chickenoodlestu commented 6 years ago

There are two scenarios that cause a generated pdf to malform the first table in the document. Documents from before and after conversion are attached. One scenario is an image immediately following another table (highlighted in Issue1.docx/Issue1.pdf), the other is another table containing a cell in its first column that only contains an image. Placing text between the table and the image solves issue 1, placing text in the cell alongside the image solves issue 2. Issue1.docx Issue2.pdf Issue2.docx Issue1.pdf

Other info:

vittala commented 6 years ago

Hi

We've confirmed that this is the case with your sample documents. Interestingly enough another way to solve the problem is to choose Table Properties... -> Options... -> untick "Automatically resize to fit contents". Weirdly enough, doing this, then ticking the option again and saving the document also sufficient to make the PDF come out correctly. So, it looks like there's some sort of internal table layout that differs from the displayed table layout and this is not being updated before conversion.

Are you generating these Word documents with Word or with another tool?

Regards Vittal

chickenoodlestu commented 6 years ago

The documents provided were indeed generated with a tool. They also contained custom styles that were removed prior to submission.

vittala commented 6 years ago

No quite sure what it is about the generated Word that's causing the issue, but in your Issue1 document, Word internally sees 3 tables - with the heading and the body of the first table separate - hence the different column widths.

Can the tool you're using to create the documents be made to specify column widths for the content columns?

We don't know what Word is doing prior to exporting to PDF to calculate the the column widths, but it looks like some checks for cell widths of zero on autofit tables. We'll have a prod around to see if it's possible to work around this.

Regards Vittal

vittala commented 6 years ago

Hi

Try https://github.com/cognidox/OfficeToPDF/releases/tag/v1.8.20.0 with the /word_fix_table_columns option.

Regards Vittal

chickenoodlestu commented 6 years ago

This release fixes the problem on the test files I attached. However, when I run it on the original document, I get "Required Types tag not found. Line 1, position 2. Did not convert". We've made note of looking at our tool, because it sounds like it can be fixed.

vittala commented 6 years ago

Hi - without a copy of the document that triggers the "required types tag" error, we won't be able to go into this any further.

I'll close the ticket, but if you're able to share an example file, please re-open.

Thanks Vittal