PHPOffice / PHPWord

A pure PHP library for reading and writing word processing documents
https://phpoffice.github.io/PHPWord/
Other
7.2k stars 2.69k forks source link

ampersand in content creating corrupt docx file #2088

Open kalibano opened 3 years ago

kalibano commented 3 years ago

content has ampersand sign 'B&B', which is causing file corrupt. I have tried solutions given here on github but doesn't work in my case.anyone have any idea how to correct this issue?its really urgent I will be grateful for help

Steps to Reproduce

Please provide a code sample that reproduces the issue.

<?php
  WordSettings::setOutputEscapingEnabled(false);
      $phpWord = new PhpWord('Word2007');
      $section = $phpWord->addSection();
     Html::addHtml($section, $content);
      $xmlWriter = new XMLWriter(XMLWriter::STORAGE_MEMORY, './', WordSettings::hasCompatibility());
       $containerWriter = new Container($xmlWriter, $section);
     $containerWriter->write();
     $htmlAsXml = $xmlWriter->getData();
      $templateProcessor->setValue('session_content',$htmlAsXml);
        WordSettings::setOutputEscapingEnabled(true);
  $templateProcessor->saveAs(storage_path('app\training.docx'));

Current Behavior

creating corrupt file

marcelkorpel commented 3 years ago

You're adding raw HTML, so you should escape your input: B&B should be B&amp;B. This can be done using htmlspecialchars (but you'll lose the ability to add HTML elements, as those will be escaped, too; however, "B&B" as a word is invalid HTML anyway).

kalibano commented 3 years ago

It is stored as B&amp;B in database. And yes i can not use htmlspecialchars as it does escape html elements too so what should do now?

marcelkorpel commented 3 years ago

I think you're correct that B&amp;B causes malformed output (it is stored as B&B in word/document.xml).

A quick and dirty hack would be to escape & another time using

$content = str_replace('&', '&amp;', $content);
kalibano commented 3 years ago

yes replaced the & with &amp; but then &npbs; causes the issue. then i replaced &npbs; with &#160; also str_replace('&rsquo;', "'", $html); $html = str_replace('&ldquo;', '"', $html); $html = str_replace('&rdquo;', '"', $html); and it is working fine now. But I am not sure how it will response for other special characters. I think it would be helpful if package itself handle all special characters

marcelkorpel commented 3 years ago

&npbs; should be &nbsp;, an invalid entity will cause an error. And of course this is only a dirty hack, not a solution. And please, next time show your exact content (or a MWE that causes the issue), as now we have to guess about what triggers the issue, apart from 'B&B'.

kalibano commented 3 years ago

sorry it was writing mistake yes replaced &nbsp; with &#160; .And I gave example 'B&B' which was the part of large content. I don't have static content.every time user enter own content. So For new content it is again creating corrupted file :( and once again needed to search for character that is causing issue.

kalibano commented 3 years ago

any other solution?

iKlsR commented 2 years ago

Still having this issue, using setValue() but fortunately using &amp; works

oleibman commented 7 months ago

You need to set output escaping. See discussion at: https://github.com/PHPOffice/PHPWord/issues/2524#issuecomment-1847981808

tintran-uit commented 1 month ago

I think you're correct that B&amp;B causes malformed output (it is stored as B&B in word/document.xml).

A quick and dirty hack would be to escape & another time using

$content = str_replace('&', '&amp;', $content);

Thanks.