drmacro / wordinator

Generate high-quality DOCX files using a simplified XML format (simple word processing XML).
Apache License 2.0
38 stars 8 forks source link

Unable to modify image size in output DOCX #70

Closed CallumES closed 2 years ago

CallumES commented 2 years ago

When testing an input HTML containing an image, we are able to produce a DOCX file however the image size constraints are not observed and a large image displays.

Example: <img src="./images/logo.jpg" width="200" height="120"/>

In the resulting DOCX I would expect to see the logo.jpg file constrained to the size restrictions detailed, h=200px, h=120px, however this is not the case. If you open the HTML file in a browser, you can see that the image dimensions are applied correctly.

Full HTML Example below:

<!DOCTYPE HTML>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    <style>
        body {
            font-family: arial;
        }
    </style>
</head>
<body>
<p style="width:300px;float:right">
    <img src="./images/logo.jpg" width="200" height="120"/>
</p>
<br/><br/><br/><br/>
<div style="width:1000px;float:left">
    <h1>Test Document - 2022 Edition</h1>
</div>
<br/><br/>
<table border="1" style="width:60%">
    <tbody>
    <tr>
        <td style="width:30%; background: #D3D3D3"><b>Summary</b></td>
        <td style="width:70%">This is a test to display the structure</td>
    </tr>
    </tbody>
</table>
<br/>
<table border="1" style="width:60%">
    <tbody>
    <tr>
        <td bgcolor="#D3D3D3" style="width:30%"><b>Document ID No:</b></td>
        <td style="width:70%">1234</td>
    </tr>
    <tr>
        <td bgcolor="#D3D3D3"><b>Issue Date:</b></td>
        <td style="width:70%">10/06/2022</td>
    </tr>
    </tbody>
</table>
<br/><br/><br/>
<h3>Table of Contents</h3>
<hr/>
<br/>
<p>1.1 - Level 1 - Page 4</p>
<br/>
<p>1.2 - Level 2 - Page 5</p>
<br/><br/>
<hr/>
<br/><br/>
<h2>2.1 - Level 1 - Example A</h2>
<p>This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
</p>
<br/><br/><br/>
<h3>2.1.1 2.1.2 - Level 2 - Example A1 Level 2 - Example A2</h3>
<p>This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
</p>
<br/><br/>
<hr/>
<h2>2.2 - Level 1 - Example B</h2>
<p>This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
</p>
<br/><br/><br/>
<h3>2.2.1 2.2.2 - Level 2 - Example B1 Level 2 - Example B2</h3>
<p>This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. This is some dummy text to take up space. 
</p>
<br/><br/>
<hr/>
<br/><br/><br/><br/><br/>
<table border="1">
    <tbody>
    <tr bgcolor="#D3D3D3">
        <th>Name</th>
        <th>ID</th>
    </tr>
    <tr>
        <td>Level 2 - Example A1</td>
        <td>3544</td>
    </tr>
    <tr>
        <td>Level 2 - Example A2</td>
        <td>8745</td>
    </tr>
    <tr>
        <td>Level 2 - Example B1</td>
        <td>2486</td>
    </tr>
    <tr>
        <td>Level 2 - Example B2</td>
        <td>9745</td>
    </tr>
    </tbody>
</table>
<br/>
</body>
</html>
drmacro commented 2 years ago

I see that the out-of-the-box HTML-to-SimpleWPML does not do anything with the width and height attributes, so that explains why it's not reflected.

Let me see what I can do to rememdy that.

drmacro commented 2 years ago

image-geometry-test.docx.zip image-geometry-test.swpx.zip

I updated the src/xsl/html2docx/baseProcessing.xsl transform to copy the @height and @width attributes to the SWPX image element. With that change the pixel widths in the HTML are reflected in the DOCX file. The generated SWPX file and resulting DOCX file is attached.

The change is committed on the develop branch.

CallumES commented 2 years ago

@drmacro When testing this out, images are now successfully constrained to the detailed size however I've noticed that tables now seem to be suffering from a lack of proportions following the implementation of this change.

It can be seen in the image-geometry-test files that you have uploaded whereby each column of the table has one character per line. Tables in this file are specified using % widths, is there perhaps a conflict between using px values for images and % values for tables now?

If you want a separate issue ticket raised for this I can do so - unsure how best to proceed with handling this issue as it directly relates to the change for the image sizing here.