Open rkorebrits opened 6 years ago
you'll have to write your PhpWord document file. If you then want to retrieve the xml you'll have to unzip it parse the XML and take the part you need ... Might be easier to build your document from scratch instead of trying to put all thing in a TemplateProcessor.
Unfortunately I can't go around the TemplateProcessor, the documents I'm working with are very custom and it's not an option to build them from scratch.
When documents are compiled, the object is written to XML, can you give me directions on how I could use this method to parse a section
and get the XML from that, would that be possible? I think having a method to parse HTML to OOXML and return it can be very handy in general, I'm currently using https://github.com/rkorebrits/HTMLtoOpenXML, which works to some extent, but it's not great at all; I still need to pre-process the HTML to remove stuff like attributes, etc as they break the output. The HTML option in your lib is way better and would much prefer utilising that.
@rkorebrits I was able to come up with a subclass of the template processor (gist here) that can replace a placeholder in your template with an OOXML AltChunk (see here) with provided markdown. AltChunks initially source their content from a separate file in the archive, but implementing consumers (e.g. MS Word) will pull the content in, convert it to OOXML and replace the content after the document is first opened. Also see this YouTube video by Eric White. HTH.
Thanks @jeffsrepoaccount However I'm not sure if I'm able to use that. We have a bunch of templates, with each multiple repeating blocks containing HTML from TinyMCE (only bold,italic and lists) this is put in by users, which is not markdown. So really need a HTML to OOXML processor. Not sure if I'm missing something, but I don't think your gist provides a solution for this? Thanks anyway!
@rkorebrits In the gist the markdown gets converted to HTML (AltChunks support text/html content types, but not text/markdown) and written to a file stored inside of the zip archive. I think all you would need to do is remove the markdown conversion and just inject your HTML markup.
You should be aware of my comment underneath the gist. Simply typing the placeholder search value (like ${replaceMe}
) in your template and saving it through Word probably won't be sufficient, since it will in all likelihood wind up inside of a text run element and replacing it with an alt chunk there violates the OOXML schema. In my templates I had to manually edit the document.xml inside the archive to ensure the alt chunks would be placed where they would be valid (which is tedious and far from ideal, but works).
Hi @rkorebrits,
I have been trying to use your script https://github.com/rkorebrits/HTMLtoOpenXML with the TemplateProcessor, but when I use the fromHTML() method with HTML content and then send the content to the template using setValue(), I don't get the formatted text - just the OpenXML text, like:
<w:p><w:r><w:t xml:space='preserve'>Bernd </w:t></w:r><w:r><w:rPr><w:i/></w:rPr><w:t xml:space='preserve'>and</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space='preserve'> Hilla </w:t></w:r><w:r><w:rPr><w:b/></w:rPr><w:t xml:space='preserve'>Becher and more</w:t></w:r><w:r><w:rPr></w:rPr><w:t xml:space='preserve'></w:t></w:r></w:p>
This happens for both table cells as well as regular fields/variables.
Am I missing something?
Thanks Cristiano
Hi @keepthinking
You will need to do something along the lines of:
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false);
$this->_template->setValue($field, $html, $limit);
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
That library just converts HTML to OOXML, it's not built only for PhpWord integration, so you will need to disable escaping before inserting it.
Hi @rkorebrits thank you for your input. I tried doing what you suggested:
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false); $phpWord->setValue($variable, "<p><strong>Test</strong></p>"); \PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
No errors, but when I open the resulting Word Document in Office, Word cannot open it because of 'invalid characters'.
Did you use the library successfully with PHPWord?
Thanks again, Cristiano
@keepthinking
Yeah you need to combine it with that library of mine, they are both separate tools.
$parser = new \HTMLtoOpenXML\Parser();
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false);
$ooXml = $parser->fromHTML('<p><strong>Test</strong></p>');
$phpWord->setValue($variable, $ooXml);
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
Did you use the library successfully with PHPWord?
Very, loads of files were created with this library and it can do quite a bit of basic stuff. Especially nested lists was a lot of work, but works good now.
Thanks. Sorry if didn’t make it clear, but I am trying to use your library with phpword, specifically with the template processor.
The code I pasted is from a more complex scripts that generates a document based on a template. But using your parser and switching escape off, the send the html using your tool, and then immediately on again, the generated document gets corrupted, as per the screenshot.
I am wondering if there’s something obvious I’m missing?
Best. Cristiano
Send some of your code and the HTML you are trying to send in? Must be missing something obvious.. are you sure it doesn't break without adding in the HTML?
Thanks Richard. I really appreciate your help. To answer your question, the templates are perfectly filled if I do not try to insert HTML using your library (i.e. using strip_tags).
My code is quite complex and it queries a database to get data. I have greatly simplified it here (removed 95% of it) and just included what is needed to test the behaviour.
The file is in the context of the Samples directly of PHPWord, so it uses its header and footer. If I send just two variables ($title in HTML and $exhibition in plan text), the resulting document is broken.
Any help as to what I am doing wrong would be greatly welcome.
Best Cristiano Archive.zip
Okay, so I downloaded your script. It seems like you can't combine injected HTML with plain text in word on the same line. Try it with this file:
Sample_00_3_html-template.docx
I know from experience that you can put HTML in a table just fine, and on a single row, but it seems you can't combine HTML with plain text in the word document.
It makes sense actually, especially when you are entering a paragraph, but expect more copy on the same line. When my library generates the OOXML output from HTML, it creates a new Word paragraph
Did that work @keepthinking ?
@rkorebrits
That works for me! Brilliant.
Now I just need to start looking through the HTMLtoOpenXML source to see how I can support more HTML (such as unordered lists, colours, font sizes etc.).
Dear @rkorebrits ,
apologies for the delay - my attention was diverted elsewhere. Thank you and yes, your example makes sense - so on the one line, it's either ALL plain text or ALL HTML - and the same applied to table cells (tested).
It would be great to have a way to combine the two, for flexibility, but for now we can work around it.
Did you notice that <ul>
elements are not supported and get converted to numbered lists, and with continuous numbering? Not sure if that's intentional.
Best Cristiano
@beard7 @keepthinking
Numbering is a whole new story. The styling for the numbering is set in numbering.xml
. It is not possible to set the numbering anywhere else. What I used to do is first create a document with 1 list style, unzip the document and make a copy of numbering.xml
, then duplicate the style block that you want in the file and copy the xml file back into your template later. A lot of work :-)
@rkorebrits I've been experimenting with inserting HTML into template using your OpenXML parser and it's generally working really well.
However, I've now hit a bit of a snag. The documents containing the HTML -> OpenXML content open perfectly well in Word, but the parsed content is missing when the same document is opened in LibreOffice (and OpenOffice).
This wouldn't normally be an issue, but I'm trying to develop a system to convert the documents to PDF on-the-fly using a headless LibreOffice. This mostly works really well, but the resulting PDFs are missing the same content.
I've noticed that if I re-save the documents using Word in Strict Open XML format, they are then perfectly formed in LibreOffice. So I tried saving the template in Strict Open XML format, but that doesn't help.
I guess this is somewhat beyond the scope of this issue, but I'm just looking for pointers.
Thanks
@beard7 yeah printing to PDF doesn't work well, but that's just due to the fact that Libre and OO don't support a lot of stuff. We dropped the print-to-pdf support quite quickly as our users were all on Windows, so they had to do print to pdf from Ms Word
@rkorebrits thank you very much your tool worked perfect is a lot of help I send a giant greeting from Colombia
@sebgam I'm glad it helped!
I could convert the html to ooxml using this function
`
use PhpOffice\PhpWord\Settings as WordSettings; use PhpOffice\Common\XMLWriter; use PhpOffice\PhpWord\Writer\Word2007\Element\Container;
function getSectionContent($section)
{
$xmlWriter = new XMLWriter(XMLWriter::STORAGE_MEMORY, './', WordSettings::hasCompatibility());
$containerWriter = new Container($xmlWriter, $section);
$containerWriter->write();
return $xmlWriter->getData();
}
`
But I have the same issue as @beard7 , the document doesn't work in libreoffice.
I imported it to office.live.com and it was weird, I could see my content in the preview but not when I opened the file, I could also share the document ( read-only share link ) and it worked great ( I had the headings and all the elements supported by phpword ) ... Crazy
I don't have Ms Word so I couldn't test it.
Hello @rkorebrits,
I think this one can solve your problem.
https://blog.mayflower.de/6699-phpword-create-documents.html
@keepthinking
Yeah you need to combine it with that library of mine, they are both separate tools.
$parser = new \HTMLtoOpenXML\Parser(); \PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false); $ooXml = $parser->fromHTML('<p><strong>Test</strong></p>'); $phpWord->setValue($variable, $ooXml); \PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
Did you use the library successfully with PHPWord?
Very, loads of files were created with this library and it can do quite a bit of basic stuff. Especially nested lists was a lot of work, but works good now.
I solved similar task by your desicion and it did`t work. I had "sex" two hour for debug What happened. I used direct output to browser, generated files. The official receipt https://phpword.readthedocs.io/en/latest/recipes.html#download-the-produced-file-automatically is cuts out my OOXML !!!.
$templateProcessor -> save(); - File good
// Later $xmlWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007'); $xmlWriter->save("php://output"); - Fail
I solved simple, echo file_get_contents(), it`s work fine.
P.S. I`m beginer in English.
Hi everybody,
I've tried to insert a simple stupid html list (<ul><li>
) with this code:
$parser = new \HTMLtoOpenXML\Parser();
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false);
$ooXml = $parser->fromHTML($value);
$this->t->setValue($key, $ooXml);
\PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
the $ooXml output is this:
<w:p><w:r><w:t xml:space='preserve'></w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val='ListParagraph'/><w:numPr><w:ilvl w:val='0'/><w:numId w:val='1'/></w:numPr><w:rPr></w:rPr></w:pPr><w:r><w:rPr></w:rPr><w:t xml:space='preserve'>first text</w:t></w:r></w:p><w:p><w:pPr><w:pStyle w:val='ListParagraph'/><w:numPr><w:ilvl w:val='0'/><w:numId w:val='1'/></w:numPr><w:rPr></w:rPr></w:pPr><w:r><w:rPr></w:rPr><w:t xml:space='preserve'>secone text</w:t></w:r></w:p><w:p><w:r><w:t xml:space='preserve'></w:t></w:r></w:p>
The text is not shown in word (MS & Libre).
I've tried for hours - has anybody an idea what's the problem??? :-(((
Thanks, Toby
I have partially resolved in PHPWord 1.0.0 version in this way:
// ------------------------------HTML "expose" -------------------------------------
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
\PhpOffice\PhpWord\Shared\Html::addHtml($section, $data['expose'], false, false);
$elements_ar = $section->getElements();
$count = count($elements_ar); // Número de elementos generados por el HTML
$templateProcessor->cloneBlock('BEXPOSE',$count, true, true);
for ($i = 1; $i <= $count; $i++) {
$tag = 'expose#'.$i;
$templateProcessor->setComplexBlock($tag , $elements_ar[$i-1]);
}
For each of the paragraphs of the HTML creates a "element" object, so you have to clone the label where the HTML content is sent.
Template:
@fhumanes Thanks for the example. It's a bit cumbersome, but it does the trick!
(I've changed the implementation to setComplexValue
in my end for better results.)
It would be nice though, if it could somehow use a template value directly instead of wrapping a template block.
Maybe with a nice little shortcut function in the processor - like $templateProcessor->setHtmlValue($search, $html);
Do you think something like this could be possible?
$parser = new \HTMLtoOpenXML\Parser(); \PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(false); $ooXml = $parser->fromHTML('
Test
'); $phpWord->setValue($variable, $ooXml); \PhpOffice\PhpWord\Settings::setOutputEscapingEnabled(true);
Without a doubt, this is the best alternative without having to use those tables and sessions! You are great
I've been trying to figure out how I can get OOXML from HTML input, to paste this into the
TemplateProcessor
. So far I haven't found a "direct" method (e.g.htmlToOOXML
), but have been trying to parse the HTML first:and then trying to get the OOXML from the section.
With
print_r($section->getPhpWord());
I do seem to be getting my HTML in some kind of PHPWord object, but is there a way to just the XML for this part?