egonl / SharpDocx

Lightweight template engine for creating Word documents
MIT License
328 stars 75 forks source link

Font changes on a single line #70

Open naasking opened 4 months ago

naasking commented 4 months ago

Very simple test case, there's a single line with two substitutions separated by a comma and a space. The styling is the same for both. The first substitution comes out correct but the second substitution ends up in a different font. I've seen it change to Times New Roman in my real project, possibly because that's used later in the template, and in this test case on my machine it changes to Aptos:

image

See project for reproduction:

WrongFont.zip

Great library aside from this minor issue!

egonl commented 4 months ago

Thanks for the compliments and the ready-to-run reproduction!

The paragraph containing the two code blocks has the Normal style, which uses Aptos. However, you overrode that font with Helvetica. In this case the result is that the first code block (Model.StreetAddress) is rendered using Helvetica, and the second (Model.City) in Aptos. Reason: SharpDocx removes any formatting in code blocks, and then inserts text using the default paragraph style.

Here's an extreme example:

image

Renders as:

image

Hope this helps.

naasking commented 4 months ago

If SharpDocx strips any formatting, then why would the first code block insert with Helvetica at all? Also, is the default paragraph style the only option, what if I want to do substitutions in a footnote or a heading using a different style but with a dynamic value, so a text block as shown in the tutorial just wouldn't cut it?

egonl commented 4 months ago

The first code blocks shows in Helvetica because that particular run in the paragraph already defined Helvetica. SharpDocx doesn't delete anything before the start of the code block, and inserts text in that run. There are more runs in that paragraph, and some of them get deleted.

If you want formatting, you should use the DocumentFormat.OpenXML library. The Inheritance sample shows how you can do that. SharpDocx is not a formatting library. If you want formatting, have a look at for example DocXPlus.

naasking commented 4 months ago

I just need a clear mental model of what's going on so I can predict what the output will look like. Your own samples show different styles for headings and substitutions work as expected, so I don't think I need anything fancier than that, I would just assign a style for the headings and footnotes as long as those are preserved, it's just this specific case that doesn't make sense to me so I'm missing something.

When you say that the first code block's run already defined Helvetica, I'm just not clear on what that means because nothing is preceding the first code block. And the second code block is all in the same paragraph from what I can see, but you say it's in a different run, an OpenXML run I assume, even though it has some plain text preceding it. Maybe you're looking at the data model of what Word saves, and so you're at the whims of however Word defines runs in this case, is that it? Is there no way to predict how Word translates what I see in the template into runs?

egonl commented 4 months ago

You can't see the runs in Word, but you can with e.g. the Open XML SDK Productivity Tool. Your paragraph looks like this:

  <w:p w:rsidR="000F1065" w:rsidRDefault="004D10B8" w14:paraId="416A7E9F" w14:textId="66DEB7E8" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml">
    <w:r w:rsidRPr="00652312">
      <w:rPr>
        <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
        <w:iCs />
      </w:rPr>
      <w:t>&lt;%=</w:t>
    </w:r>
    <w:proofErr w:type="spellStart" />
    <w:r w:rsidRPr="00652312">
      <w:rPr>
        <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
        <w:iCs />
      </w:rPr>
      <w:t>Model.StreetAddress</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd" />
    <w:r>
      <w:rPr>
        <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
        <w:iCs />
      </w:rPr>
      <w:t xml:space="preserve"> %&gt;, &lt;%= </w:t>
    </w:r>
    <w:proofErr w:type="spellStart" />
    <w:r w:rsidRPr="00652312">
      <w:rPr>
        <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
        <w:iCs />
      </w:rPr>
      <w:t>Model.City</w:t>
    </w:r>
    <w:proofErr w:type="spellEnd" />
    <w:r w:rsidRPr="00652312">
      <w:rPr>
        <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
        <w:iCs />
      </w:rPr>
      <w:t>%&gt;</w:t>
    </w:r>
  </w:p>

And this is what SharpDocx outputs:

<w:p w:rsidRPr="00795CBA" w:rsidR="000F1065" w:rsidP="00795CBA" w:rsidRDefault="00795CBA" w14:paraId="416A7E9F" w14:textId="1ABD1E39" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:pPr>
    <w:rPr>
      <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
    </w:rPr>
  </w:pPr>
  <w:r w:rsidRPr="00795CBA">
    <w:rPr>
      <w:rFonts w:ascii="Helvetica" w:hAnsi="Helvetica" w:cs="Helvetica" />
    </w:rPr>
    <w:t xml:space="preserve" />
    <w:t xml:space="preserve">123 King st</w:t>
  </w:r>
  <w:r w:rsidRPr="00795CBA">
    <w:t xml:space="preserve">, </w:t>
    <w:t xml:space="preserve">Toronto</w:t>
  </w:r>
  <w:r w:rsidRPr="00795CBA">
    <w:t xml:space="preserve" />
  </w:r>
</w:p>
naasking commented 4 months ago

I'll take that to mean that the runs are just defined by Word and there's no simple way to predict it.

In looking through the SharpDocx code, it seems like the main issue is that all elements between start and end markers are removed. In principle, it seems simple enough to be more selective, such as removing only the code element itself and all subsequent elements up to the end tag. The following patch preserves all of the style settings exactly like this within a paragraph. It's a bit crude in how it looks for the code element, but it's otherwise pretty simple. Are there any issues I'm overlooking?

preserve run props.patch

Edit: I know this isn't a final solution, I'm happy to hack on this some more to get something working for all samples, I'm just wondering if this is a reasonable approach or if there's some deeper complexity that won't permit this kind of feature.

naasking commented 4 months ago

Actually this patch seems more robust, the tutorial now seems to mostly generate correctly, and the updated template shows that formatting is preserved. There are still a couple of small issues with the template so some corner cases aren't correctly handled, but I'm surprised how just a few lines got most of the way there.

preserve run props.patch TestTemplate.docx

egonl commented 4 months ago

Maybe I'm missing someting, but with the TestTemplate.docx you provided I can't see any difference between the output of SharpDocx 2.5.0 and the output of your patch.

naasking commented 4 months ago

I haven't abandoned this, I'm just in the middle of training some new hires so I haven't had a chance to dig back into this. I will get back to this soon.

naasking commented 4 months ago

Ok, I finally got time to review this. My first patch actually solved the problem but was too crude to work with the tutorial, and my later "fix" to get the tutorial working undid the hack that works. Here's a more robust version of a working hack. I confirmed that it works in my project where I was seeing the styling issue from the test case I submitted, and the tutorial still generates correctly, so nothing obvious breaks. How it works:

SharpDocx performs some kind of traversal over the document element tree to flatten it into an Elements list, and then traverses the list between start/end indices, removing elements and inserting a placeholder text element used for the final substitution. This placeholder ends up getting some default style attached to it because the run properties of the original elements that contain styling information fall between start/end, and so end up getting removed.

Basically, this hack identifies a different marker between start and end, and deletes only text elements up to that marker, then resumes deleting all other elements as usual after the marker. This preserves the run properties of the original template, and so the styling of the inserted element is preserved.

This is a total hack but works better than expected, and it's what I need to continue my project. I'm not doing anything fancy, I just need the output based on the template to be predictable. Let me know if there's a way to refine this to get it officially accepted, otherwise I'll just fork and use that for my project since this is really the only issue I've run into.

Thanks for your patience!

sharpdocx.patch