PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.26k stars 2.7k forks source link

Sub/superscript HTML is rendered with line breaks #2476

Open dsuurlant opened 1 year ago

dsuurlant commented 1 year ago

Describe the Bug

When attempting to add text with <sup>...</sup> or <sub>...</sub> tags using Html::addHtml and placing it into a Section or Table Cell, the resulting text has unwanted line breaks. When placing it in a TextRun, the text isn't rendered at all.

Steps to Reproduce

Please provide a code sample that reproduces the issue.

$textWithTags = "C<sup>8</sup>H<sup>10</sup>N<sub>4</sub>O<sub>2</sub>";

$phpWord = new PhpWord();
$section = $phpWord->addSection();
Html::addHtml($section, $textWithTags); // Rendered with line breaks

$table = $section->addTable();
$row = $table->addRow();
$cell = $row->addCell();
Html::addHtml($cell, $textWithTags); // Rendered with line breaks

$textRun = new TextRun();
Html::addHtml($textRun, $textWithTags); 
$section->addTextRun($textRun); // Doesn't work at all, just whitespace

$tmpFile = '/tmp/PhpWordTest.docx';
$objWriter = IOFactory::createWriter($phpWord);
$objWriter->save($tmpFile);

Our target behavior is actually to render the text with sub/sup tags inside a table that is used with the template processor:

// Create template
$template = new PhpWord();
$section = $template->addSection();
$section->addText('${replaceThis}');

$templateFile = '/tmp/PhpWordTemplate.docx';
$objWriter = IOFactory::createWriter($template);
$objWriter->save($templateFile);

// Create table to render in template
$table = new Table();
$row = $table->addRow();
$cell = $row->addCell();
Html::addHtml($cell, $textWithTags);

$processor = new TemplateProcessor($templateFile);
$templateProcessor->setComplexValue('replaceThis', $table);
$renderedTemplateFile = '/tmp/PhpWordRenderedTemplate.docx';
$processor->saveAs($renderedTemplateFile);

In the above example, the text inside the table cell has line breaks as well.

Working example

We know this behaviour is possible as this does work in the TemplateProcessor with a TextRun:

$processor = new TemplateProcessor($templateFile);
$textRun = new TextRun();
Html::addHtml($textRun, $textWithTags);
$processor->setComplexValue('replaceThis', $textRun);
$renderedTemplateFile = '/tmp/PhpWordRenderedTemplate.docx';
$processor->saveAs($renderedTemplateFile);

Now the sub/sup tags are rendered correctly on a single line.

Expected Behavior

I would expect the text to be rendered without line breaks whether in a table cell, a textrun, or any other element; with or without the TemplateProcessor.

Current Behavior

Text with sub or sup HTML tags is rendered with line breaks.

Context

Please fill in your environment information:

thomasb88 commented 1 year ago

I have a different environment than yours, but

I made it work with $textWithTags = "<p>C<sup>8</sup>H<sup>10</sup>N<sub>4</sub>O<sub>2</sub></p>"; instead of $textWithTags = "C<sup>8</sup>H<sup>10</sup>N<sub>4</sub>O<sub>2</sub>";

I suppose it is because in the readParagraph function, there is the following part

if (0 === $textRunContainers) {
                $parent->addTextBreak(null, $paragraphStyle);
            }

And then when writing it, in HTML writer for example, it does

if ($this->withoutP) {
            $content = '<br />' . PHP_EOL;
        } else {
            $content = '<p>&nbsp;</p>' . PHP_EOL;
        }

Note that adding a paragraph around your $textWithTags add also a text run

oleibman commented 1 year ago

First, this seems to be a problem with all formatting tags (e.g. <b> will cause similar results). Section expects a line break between its constituent elements; TextRun, however, does not. So, the following minor modification of your third ("whitespace") attempt seems to work:

$textRun = $section->addTextRun();
Html::addHtml($textRun, $textWithTags);

Likewise for a table cell:

$textRun = $cell->addTextRun();
Html::addHtml($textRun, $textWithTags);
thomasb88 commented 1 year ago

Perhaps because for OOXML, " A section is a grouping of paragraphs that have a specific set of properties used to define the pages on which the text will appear."

http://officeopenxml.com/WPsection.php