Closed tmpmachine closed 5 months ago
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using OfficeIMO.Word;
namespace OfficeIMO.Examples.Word {
internal static partial class Embed {
public static void Example_EmbedFileHTML(string folderPath, string templateFolder, bool openWord) {
Console.WriteLine("[*] Creating standard document with embedded HTML file");
string filePath = System.IO.Path.Combine(folderPath, "EmbeddedFileHTML.docx");
string htmlFilePath = System.IO.Path.Combine(templateFolder, "SampleFileHTML.html");
using (WordDocument document = WordDocument.Create(filePath)) {
Console.WriteLine("Embedded documents in word: " + document.EmbeddedDocuments.Count);
Console.WriteLine("Embedded documents in Section 0: " + document.Sections[0].EmbeddedDocuments.Count);
document.AddParagraph("Add HTML document in DOCX");
document.AddSection();
Console.WriteLine("Embedded documents in Section 1: " + document.Sections[1].EmbeddedDocuments.Count);
document.AddEmbeddedDocument(htmlFilePath);
document.EmbeddedDocuments[0].Save("C:\\TEMP\\EmbeddedFileHTML.html");
Console.WriteLine("Embedded documents in word: " + document.EmbeddedDocuments.Count);
Console.WriteLine("Embedded documents in Section 0: " + document.Sections[0].EmbeddedDocuments.Count);
Console.WriteLine("Embedded documents in Section 1: " + document.Sections[1].EmbeddedDocuments.Count);
Console.WriteLine("Content type: " + document.EmbeddedDocuments[0].ContentType);
document.Save(openWord);
}
}
}
}
This worked for me when I tried it.
Well, yeah, but I was expecting the content to be rendered like the second line here:
.. or is it just me? I'm using WPS office, don't have ms word
I tested it on Word and had this working.
When I run:
This is the HTML it's embedding:
And this is Word:
So it clearly works in Word. Keep in mind that embedding is just putting it in special structure in XML and then the whole "hard work" is done by Word when displaying it. Maybe there's a problem that WPS Office requires some changes to "trigger" that embedding.
For example notice that some things require special fixes for it to open properly
Maybe you could create some word document in wps office (whatver that is) and compare differences
So it clearly works in Word. Keep in mind that embedding is just putting it in special structure in XML and then the whole "hard work" is done by Word when displaying it. Maybe there's a problem that WPS Office requires some changes to "trigger" that embedding.
That must be it. I asked a friend to open a file and it require installing some plugins or something, and decided not to go with embedding.
There's this library than can convert HTML to openXML: html2openxml.
The elements collection being parsed by html2openxml is somewhat connected to DocumentFormat.OpenXml.Wordprocessing.Paragraph
.
string filepen = $"C:\\Users\\tmp7\\Desktop\\penman-{Guid.NewGuid().ToString()}.docx";
using (var package = WordprocessingDocument.Create(filepen, WordprocessingDocumentType.Document))
{
package.AddMainDocumentPart();
var mainPart = package.MainDocumentPart;
mainPart.Document = new Document();
var body = new Body();
var sectionProp = new SectionProperties();
var pageSetup = new PageMargin() { Top = 1701, Left = 1134, Right = 1134, Bottom = 850 };
sectionProp.Append(pageSetup);
body.Append(sectionProp);
var converter = new HtmlConverter(mainPart);
var para = converter.Parse(htmlText);
var runProp = new RunProperties();
runProp.Append(new Bold(), new FontSize() { Val = "32" });
var paragProp = new ParagraphProperties();
var justif = new Justification() { Val = JustificationValues.Center };
paragProp.Append(justif);
foreach (var item in para)
{
// <-- item is somewhat connected to DocumentFormat.OpenXml.Wordprocessing.Paragraph
body.Append(item);
}
mainPart.Document.Append(body);
}
Then, I found that in WordParagraph.cs, Paragraph
is DocumentFormat.OpenXml.Wordprocessing.Paragraph
.
...
using OfficeMath = DocumentFormat.OpenXml.Math.OfficeMath;
using Paragraph = DocumentFormat.OpenXml.Wordprocessing.Paragraph;
using ParagraphProperties = DocumentFormat.OpenXml.Wordprocessing.ParagraphProperties;
...
I still can't quite figure out the solution how to get these converted elements into WordParagraph. Is there a wayto take the XML and directly appending it to the .docx?
I know this library and use it in PSWriteOffice in PowerShell.
We do expose WordProcessingDocument in the document, so you can append things directly to body if you wish.
In this case Copilot shows a way, but you can just append whatever you create to it.
Awesome! Got it working now, thanks!
Still using html2openxml, though the link elements is not working in my case, could be WPS office only, will check later.
using (WordDocument doc = WordDocument.Create(outputPath))
{
...
foreach (var item in para)
{
doc._document.MainDocumentPart.Document.Body.Append(item);
}
...
}
For future reference, if anyone looking for a way to append HTML under a list, you can try to create a table, set the table indent, and put the parsed result into the table.
using HtmlToOpenXml;
// ....
// # create a single cell table
Table table = new Table();
var tableProperties = new TableProperties(new TableBorders(new TopBorder(), new BottomBorder(), new LeftBorder(), new RightBorder(), new InsideHorizontalBorder(), new InsideVerticalBorder())) {
TableIndentation = new TableIndentation() {
Width = (int)CentimetersToTwips(1.24), // adjust to the list item indent
}
};
table.AppendChild(tableProperties);
var row = new TableRow();
var cell = new TableCell();
row.Append(cell);
table.Append(row);
// # append table to document body
doc._document.MainDocumentPart.Document.Body.Append(table);
// # parse html
var converter = new HtmlConverter(doc._document.MainDocumentPart);
foreach (OpenXmlCompositeElement item in converter.Parse(htmlText))
{
// Retrieve the first row and first cell
TableRow firstRow = table.Elements<TableRow>().FirstOrDefault();
TableCell firstCell = firstRow?.Elements<TableCell>().FirstOrDefault();
// # append to table
firstCell.Append(item);
}
I have a requirement to insert rich text content, and currently trying html agility pack by parsing and traversing the DOM.
Just to confirm, OfficeIMO still don't have a feature to insert HTML as word document elements, right? I've checked embedding, but it seems only inserting the plain text of the .html file.
If so, am I on the right path? Is there any other way than traversing manually? maybe any known works or solution that's compatible with OfficeIMO?