antonmihaylov / OpenXmlTemplates

Word .docx templating system that is designer (no scripting tags) and server-friendly (no word installation required)
GNU Lesser General Public License v3.0
89 stars 25 forks source link

Support text formatting? #22

Open rklec opened 2 years ago

rklec commented 2 years ago

Support for text formatting of the input data, e.g. newlines, bold or italic/underline would be very useful.

The syntax for the input data could be similar to Markdown.

dgwaldo commented 2 years ago

That's interesting, how would this be different say than just applying formatting to the text in the templated control?

rklec commented 2 years ago

how would this be different say than just applying formatting to the text in the templated control?

How do you mean that? AFAIK the only way to use your project with text formatting is first using your one, and then using the OpenXML SDK to find the replaced text again and then format it.

Ahhh wait... I guess I know what you think: You think I want this:

The {{BoldCustomer}} wants to be {{ItalicAttribute}}.

This can of course be done in the e.g. Word file.

However, I actually want the placeholders to support arbitrary (or at least some basic) formatting like:

The {{Customer}} wants to be {{Attribute}}., because: {{Reasons}}

where {{Customer}} should be replaced by a string like **Bold** *Italic* Company (rendered as Bold Italic Company in the final document) and so on. So {{Reasons}} should be converted to this:

  • ...,we like them.
  • ...,we want them to like us.
  • ...,etc.

So the replacement value should support text formatting actually. The use case is just that your input data is e.g. from a user and you want to provide a simple way to style the output in arbitrary ways - you don't know beforehand what will be styled and how.

antonmihaylov commented 2 years ago

I see, so in essence you mean the template engine to recognize markdown-like syntax and apply the according to OpenXML formatting. For example -> new Bold() property.

Should be possible if it splits up and creates a new Text for each separate format. Similar to how it handles new lines here and inserts a Break tag between them.

and seem simple enough that a regex would catch them, do you have any more complex formats in mind?

rklec commented 2 years ago

Yeah that's essential what I want. I guess a simple newlines, bold or italic/underline would be a good start. More difficult, but very useful would likely also be enumerations/lists like * for lists and 1. for enumerations.

Note I also found this, likely not very up-to-date library: https://github.com/danbroooks/MarkdownToOpenXML

tlyau62 commented 1 year ago

Hi, I have faced a similar situation, but the input data is of html type. To solve it, I have created a HtmlControlReplacer to parse the html content to the actual openxml. Hope the following code or this example repo can help. Thanks for the great library.

using DocumentFormat.OpenXml;
using OpenXMLTemplates;
using OpenXMLTemplates.ControlReplacers;
using OpenXMLTemplates.Documents;
using OpenXMLTemplates.Variables;
using System;
using System.Collections.Generic;
using System.Text;
using System.Linq;
using System.Xml.Linq;
using System.IO;
using DocumentFormat.OpenXml.Wordprocessing;

namespace WordDocVar
{
    public class HtmlControlReplacer : ControlReplacer
    {
        public override string TagName => "html";

        protected override OpenXmlExtensions.ContentControlType ContentControlTypeRestriction => OpenXmlExtensions.ContentControlType.RichText;

        protected override string ProcessControl(string variableIdentifier, IVariableSource variableSource, ContentControl contentControl, List<string> otherParameters)
        {
            return variableSource.GetVariable<string>(variableIdentifier);
        }

        protected override void OnReplaced(ContentControl e)
        {
            var html = e.SdtElement.InnerText;
            var oml = ConvertHtmlToÓml(html) as Document;
            var nodes = oml.Body.Elements()
                .SkipLast(1)
                .Select(n => n.CloneNode(true));

            e.SdtElement.RemoveAllChildren();

            foreach (var node in nodes) { 
                e.SdtElement.AppendChild(node);
            }

            base.OnReplaced(e);
        }

        private OpenXmlElement ToOpenXmlElement(XElement element)
        {
            // Write XElement to MemoryStream.
            using var stream = new MemoryStream();
            element.Save(stream);
            stream.Seek(0, SeekOrigin.Begin);

            // Read OpenXmlElement from MemoryStream.
            using OpenXmlReader reader = OpenXmlReader.Create(stream);
            reader.Read();
            return reader.LoadCurrentElement();
        }

        private OpenXmlElement ConvertHtmlToÓml(string html)
        {
            var xe = XElement.Parse($"<html><body>{html}</body></html>");
            var wml = OpenXmlPowerTools.HtmlToWmlConverter.ConvertHtmlToWml("", "", "", xe, OpenXmlPowerTools.HtmlToWmlConverter.GetDefaultSettings());

            return ToOpenXmlElement(wml.MainDocumentPart);
        }
    }
}

Then register it in the DefaultOpenXmlTemplateEngine.