PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.26k stars 2.69k forks source link

TextBox elements missing #2183

Open matschek opened 2 years ago

matschek commented 2 years ago

Describe the Bug

Reading different files in Word 2007 format, I either get TextBoxes as standard Text elements or I don't get the element at all.

Steps to Reproduce

Sample document with the text "Normal text (ok)" in the body, and a textfield with the text "textfield within body".

<?php
$filename = "textbox-problem-fail.docx";
#$filename = "textbox-problem-ok.docx";

$doc = (new \PhpOffice\PhpWord\Reader\Word2007())->load($filename);

$elements = $doc->getSections()[0]->getElements();

echo "Found " . count($elements) . " elements in body-container";
foreach ($elements as $element) {
    echo "<br>Element " . get_class($element) . ": ";
    switch (get_class($element)) {
        case "PhpOffice\PhpWord\Element\TextRun" :
            $subElements = $element->getElements();
            echo "<br>Found container with " . count($subElements) . " subElements";
            foreach ($subElements as $subElement) {
                echo "<br>SubElement " . get_class($subElement) . ": ";
                switch (get_class($subElement)) {
                    case "PhpOffice\PhpWord\Element\Text":
                        echo "<b>". $subElement->getText() . "</b>";
                        break;
                    default:
                        echo "[ignored]";
                }
            }
            break;

        default:
            echo "[ignored]";
    }
}

Expected Behavior

Expecting the textfield as PhpOffice\PhpWord\Element\TextBox object, acting as a container with other elements inside.

Current Behavior

Result for "textbox-problem-fail.docx": Textfield element and its content is completely missing. Found 1 elements in body-container Element PhpOffice\PhpWord\Element\TextRun: Found container with 1 subElements SubElement PhpOffice\PhpWord\Element\Text: Normal text (ok)

Result for "textbox-problem-ok.docx": Textfield is returned as standard Text element. At least, we get the text here. Found 1 elements in body-container Element PhpOffice\PhpWord\Element\TextRun: Found container with 2 subElements SubElement PhpOffice\PhpWord\Element\Text: textfield within body (OK) SubElement PhpOffice\PhpWord\Element\Text: Normal text (ok)

Context

matschek commented 2 years ago

textbox-problem-ok.docx textbox-problem-fail.docx

matschek commented 2 years ago

I took a look at the XML of the documents and it seems the one where no TextBox is detected, the content is in another node than the libary is looking for. I made a quick guess and fix which works for my purpose but may not respect all things like it should.

In AbstractPart.php I inserted an if..else to differ between an image and a textbox

<?php

    # AbstractPart.php, line 252

     } elseif ($node->nodeName == 'w:pict') {
        /* begin of new code part */
        // Textbox
        $textbox = $xmlReader->getElement('v:shape/v:textbox', $node);
        if (null !== $textbox) {
            $new_txtbx = $parent->addTextBox();
            $txbxContent_wp = $xmlReader->getElements('w:txbxContent/w:p', $textbox);
            foreach ($txbxContent_wp as $wp) {
                $this->readParagraph($xmlReader, $wp, $new_txtbx, $docPart);
            }
        } else {
            // Image
        /* end of new code part */

To allow this to work I had to make another change and allow a TextBox to be inserted in a TextRun, which is forbidden by default. I don't know why and if it breaks other things :-/

<?php
    # AbstractContainer, line 245:

    'TextBox'       => array('Section', 'Header', 'Footer', 'Cell', 'TextRun'), /*+TextRun*/

Hope this helps to develop the real fix.

ghost commented 2 years ago

It was usefull

zxf5115 commented 1 year ago

thanks!