PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.29k stars 2.7k forks source link

Html table multiple columns #1515

Open jgpATs2w opened 6 years ago

jgpATs2w commented 6 years ago

This is:

Expected Behavior

Should show the content of the second row, not only the heading.

Current Behavior

Does not parse a html table with multiple columns

Failure Information

Using the same html from the test HtmlTest.php: testParseTable, the test passes, but the content of the body is not generated correctly, at least opening the file in libreoffice 5.1.62

How to Reproduce

Please provide a code sample that reproduces the issue.

<?php
require __DIR__ . '/vendor/autoload.php';

        $tmpDir = sys_get_temp_dir();
        $outputPath = implode(DIRECTORY_SEPARATOR, array($tmpDir, 'writer.docx'));

        $contents= '<table align="left" style="width: 50%; border: 6px #0000FF solid;">
        <thead>
            <tr style="background-color: #FF0000; text-align: center; color: #FFFFFF; font-weight: bold; ">
                <th style="width: 50pt">header a</th>
                <th style="width: 50">header b</th>
                <th style="border-color: #00FF00; border-width: 3px">header c</th>
            </tr>
        </thead>
        <tbody>
            <tr><td style="border-style: dotted;">1</td><td colspan="2">2</td></tr>
            <tr><td>This is <b>bold</b> text</td><td>5</td><td><p>6</p></td></tr>
        </tbody>
    </table>';

        $phpWord = new \PhpOffice\PhpWord\PhpWord();

        $section = $phpWord->addSection();

        \PhpOffice\PhpWord\Shared\Html::addHtml($section, $contents, false, $phpWord, $tmpDir );

        $phpWord->save( $outputPath, "Word2007");

        exec('libreoffice '.$outputPath);//requires linux and libreoffice installed

Context

nixprosoft commented 5 years ago

Same problem in PHPWord 0.15.0, PHP 7.1. For me, provided example don't work as expected at all.

Error while opening: 2018-11-30_15-35-22

View: 2018-11-30_15-35-45

reimax commented 5 years ago

Also faced this problem. one column - everything works, two or more-an error. The demo example does not work. Has anyone found how to solve the problem?

troosan commented 5 years ago

indeed, the document seems to open fin in MS Word, not in LibreOffice. easily reproducible running Sample_26_Html.php

troosan commented 5 years ago

Actually, LibreOffice does not seem to like the way the tables are written. Even the result of Sample_09_Tables.php has some issues when opening with LibreOffice. If you can pinpoint the cause, feel free to comment.

jgpATs2w commented 5 years ago

This how I see the output from Sample_26_Html.php in docx (see the file). The output of Sample_09_Tables.php looks fine (at least preserve columns) in libreoffice .

opened in google docs

google_docx

opened in libreoffice 5.1.6 ubuntu

libreoffice_docx

opened in windows 2007

the columns are reversed! win_docx

Looks like google is making assumtions on how to render the table, that the others don't do. Probably some essential information is missed or is ambiguous.

Looking at the Sample_26_Html.docx_FILES/word/document.xml, I just see empty gridCol. But comparing with a basic table example there is much more information I can't evaluate:

<w:tbl>
            <w:tblGrid>
                <w:gridCol/> <!-- missing width?-->
                <w:gridCol/>
                <w:gridCol/>
            </w:tblGrid>
            <w:tblPr>
                <w:jc w:val="center"/>
                <w:tblW w:w="2500" w:type="pct"/>
                <w:tblLayout w:type="autofit"/>
                <w:bidiVisual w:val="0"/>
                <w:tblBorders>
                    <w:top w:val="single" w:sz="4.5" w:color="0000FF"/>
                    <w:left w:val="single" w:sz="4.5" w:color="0000FF"/>
                    <w:right w:val="single" w:sz="4.5" w:color="0000FF"/>
                    <w:bottom w:val="single" w:sz="4.5" w:color="0000FF"/>
                    <w:insideH w:val="single" w:sz="4.5" w:color="0000FF"/>
                    <w:insideV w:val="single" w:sz="4.5" w:color="0000FF"/>
                </w:tblBorders>
            </w:tblPr>
            <w:tr>
                <w:trPr>
                    <w:tblHeader w:val="1"/>
                </w:trPr>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                        <w:shd w:val="clear" w:fill="FF0000"/>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr>
                                <w:color w:val="FFFFFF"/>
                                <w:b w:val="1"/>
                                <w:bCs w:val="1"/>
                                <w:shd w:val="clear" w:fill="FF0000"/>
                            </w:rPr>
                            <w:t xml:space="preserve">header a</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                        <w:shd w:val="clear" w:fill="FF0000"/>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr>
                                <w:color w:val="FFFFFF"/>
                                <w:b w:val="1"/>
                                <w:bCs w:val="1"/>
                                <w:shd w:val="clear" w:fill="FF0000"/>
                            </w:rPr>
                            <w:t xml:space="preserve">header          b</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="9" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="9" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="9" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="9" w:color="0000FF"/>
                        </w:tcBorders>
                        <w:shd w:val="clear" w:fill="FFFF00"/>
                    </w:tcPr>
                    <w:p>
                        <w:pPr>
                            <w:pBdr>
                                <w:top w:val="single" w:sz="9"/>
                                <w:left w:val="single" w:sz="9"/>
                                <w:right w:val="single" w:sz="9"/>
                                <w:bottom w:val="single" w:sz="9"/>
                            </w:pBdr>
                        </w:pPr>
                        <w:r>
                            <w:rPr>
                                <w:color w:val="FFFFFF"/>
                                <w:b w:val="1"/>
                                <w:bCs w:val="1"/>
                                <w:shd w:val="clear" w:fill="00FF00"/>
                            </w:rPr>
                            <w:t xml:space="preserve">header c</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
            <w:tr>
                <w:trPr/>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="dotted" w:sz="4.5" w:color="FF0000"/>
                            <w:left w:val="dotted" w:sz="4.5" w:color="FF0000"/>
                            <w:right w:val="dotted" w:sz="4.5" w:color="FF0000"/>
                            <w:bottom w:val="dotted" w:sz="4.5" w:color="FF0000"/>
                        </w:tcBorders>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">1</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                        <w:gridSpan w:val="2"/>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">2</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
            <w:tr>
                <w:trPr/>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">This is </w:t>
                        </w:r>
                        <w:r>
                            <w:rPr>
                                <w:b w:val="1"/>
                                <w:bCs w:val="1"/>
                            </w:rPr>
                            <w:t xml:space="preserve">bold</w:t>
                        </w:r>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve"> text</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr>
                        <w:tcBorders>
                            <w:top w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:left w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:right w:val="double" w:sz="4.5" w:color="0000FF"/>
                            <w:bottom w:val="double" w:sz="4.5" w:color="0000FF"/>
                        </w:tcBorders>
                    </w:tcPr>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">6</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>

Anyone with experience in ooxml, can check what may be producing these inconsistencies?

rikvdlooi commented 4 years ago

I just experienced this issue as well. I tested and indeed the missing width on the gridCol seems to be the problem.

Test html that I used:

<table>
    <tr>
        <td>AAA</td>
        <td>BBB</td>
        <td>CCC</td>
    </tr>
</table>

Generated the following word/document.xml:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006" 
    xmlns:o="urn:schemas-microsoft-com:office:office" 
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" 
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" 
    xmlns:v="urn:schemas-microsoft-com:vml" 
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" 
    xmlns:w10="urn:schemas-microsoft-com:office:word" 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
    <w:body>
        <w:tbl>
            <w:tblGrid>
                <w:gridCol/>
                <w:gridCol/>
                <w:gridCol/>
            </w:tblGrid>
            <w:tblPr>
                <w:tblW w:w="0" w:type="auto"/>
                <w:tblLayout w:type="autofit"/>
                <w:bidiVisual w:val="0"/>
            </w:tblPr>
            <w:tr>
                <w:trPr/>
                <w:tc>
                    <w:tcPr/>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">AAA</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr/>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">BBB</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
                <w:tc>
                    <w:tcPr/>
                    <w:p>
                        <w:pPr/>
                        <w:r>
                            <w:rPr/>
                            <w:t xml:space="preserve">CCC</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>
        <w:sectPr>
            <w:pgSz w:orient="portrait" w:w="11905.511811023622" w:h="16837.79527559055"/>
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
            <w:cols w:num="1" w:space="720"/>
        </w:sectPr>
    </w:body>
</w:document>

When I manually changed the three w:gridCol to have a w:w="1", rezip it into a docx, the file opened fine in LibreOffice.

Conclusion not having a w:w on the gridCol's is the reason for this issue.

I did some digging through the code, and the w:w attribute should be set in the writeColumns method on the PhpOffice\PhpWord\Writer\Word2007\Element\Table class:

private function writeColumns(XMLWriter $xmlWriter, TableElement $element)
{
    // So we have to look in findFirstDefinedCellWidths
    $cellWidths = $element->findFirstDefinedCellWidths();

    $xmlWriter->startElement('w:tblGrid');
    foreach ($cellWidths as $width) {
        $xmlWriter->startElement('w:gridCol');
        if ($width !== null) {
            $xmlWriter->writeAttribute('w:w', $width);
            $xmlWriter->writeAttribute('w:type', 'dxa');
        }
        $xmlWriter->endElement();
    }
    $xmlWriter->endElement(); // w:tblGrid
}

The findFirstDefinedCellWidths method checks the getWidth methods on the Cell's (PhpOffice\PhpWord\Element\Cell), which returns the width property of the cell instance. The width property is only set in the __construct method.

So, when are these Cell's created? I looked at this from the other side, starting with the html input. When a td element is found the parseCell method on the PhpOffice\PhpWord\Shared\Html class is called, this in turn calls the addCell on the PhpOffice\PhpWord\Element\Row class. The first parameter is the $width, however this is hardcoded in the Html class to be null. So a width will never be set, thus no w:w will ever be set on the gridCol's.

And that is were my lack of knowledge of word/xml stops me. Just hardcoding a 1 instead of null for the cell width seems to fix the issue. However I don't know the implications, is this correct for docx, will this break other stuff (probably)?

BTW I also tried setting a width trough styles on the td element, however the findFirstDefinedCellWidths does not look at width set from styles on the Cell, just the width property itself.