elapouya / python-docx-template

Use a docx as a jinja2 template
GNU Lesser General Public License v2.1
1.97k stars 382 forks source link

Cell background tags do not work in docx files with shorthand tags (i.e., with <w:tcPr /> instead of <w:tcPr>...</w:tcPr>) #559

Open pecanka opened 3 weeks ago

pecanka commented 3 weeks ago

Description of issue

Some writers (e.g., Pandoc) produce valid docx files with shorthand tags such as <w:tcPr /> instead of the fullhand <w:tcPr></w:tcPr>. This causes the functionality of docxtpl inside its method patch_xml() such as the setting of cell background color via {% cellbg <var> %} to not function properly.

Below are two examples of document.xml that differ in having <w:tcPr>\n</w:tcPr> (works) and <w:tcPr /> (does not work). Both documents contain the tag {% cellbg 444444 %}. In the former case, the application of docxtpl works. In the latter case it does not.

OK (before)

<w:tcPr>
</w:tcPr>

OK (after)

<w:tcPr>
<w:shd w:val="clear" w:color="auto" w:fill="444444 "/>
</w:tcPr>

Fail (before)

<w:tcPr />

or

<w:tcPr></w:tcPr>

Fail (after)

<w:tcPr />
<w:shd w:val="clear" w:color="auto" w:fill="444444 "/>

New lines

As shown above, a plain pair <w:tcPr></w:tcPr> (i.e., without at least the new line between the tags), which Word normalizes to <w:tcPr /> anyway, will not work either. In a real Word document, i.e., without new line symbols, a valid example of "OK before" would be this:

<w:tcPr><w:tcW w:w="0" w:type="auto"/></w:tcPr>

This is also use in Example A below.

Severity

None of this breaks the document to the degree that Word would fail to open, but in the latter case there is no cell background color after Word removes the line <w:shd w:val="clear" w:color="auto" w:fill="444444 "/>.

Reproduceable example

Example A: Content of document.xml where docxtpl works

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
    xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex"
    xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex"
    xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex"
    xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex"
    xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex"
    xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex"
    xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex"
    xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex"
    xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink"
    xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:oel="http://schemas.microsoft.com/office/2019/extlst"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash"
    xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh wp14">
    <w:body>
        <w:tbl>
            <w:tblPr>
                <w:tblW w:w="0" w:type="auto"/>
                <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
            </w:tblPr>
            <w:tblGrid>
                <w:gridCol w:w="2398"/>
            </w:tblGrid>
            <w:tr w:rsidR="003E4BB2" w14:paraId="5EAE5069" w14:textId="77777777" w:rsidTr="003E4BB2">
                <w:tc>
                    <w:tcPr>
                      <w:tcW w:w="0" w:type="auto"/>
                    </w:tcPr>
                    <w:p w14:paraId="3F9F55E0" w14:textId="02A624C4" w:rsidR="003E4BB2" w:rsidRDefault="00771A74">
                        <w:r>
                            <w:t>{%cellbg 444444 %} test</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>
        <w:p w14:paraId="53BD41B7" w14:textId="77777777" w:rsidR="0076346F" w:rsidRDefault="0076346F"/>
        <w:sectPr w:rsidR="0076346F">
            <w:footerReference w:type="even" r:id="rId6"/>
            <w:footerReference w:type="default" r:id="rId7"/>
            <w:footerReference w:type="first" r:id="rId8"/>
            <w:pgSz w:w="12240" w:h="15840"/>
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
            <w:cols w:space="720"/>
            <w:docGrid w:linePitch="360"/>
        </w:sectPr>
    </w:body>
</w:document>

Example B: Content of document.xml where docxtpl DOES not work

<?xml version='1.0' encoding='UTF-8' standalone='yes'?>
<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas"
    xmlns:cx="http://schemas.microsoft.com/office/drawing/2014/chartex"
    xmlns:cx1="http://schemas.microsoft.com/office/drawing/2015/9/8/chartex"
    xmlns:cx2="http://schemas.microsoft.com/office/drawing/2015/10/21/chartex"
    xmlns:cx3="http://schemas.microsoft.com/office/drawing/2016/5/9/chartex"
    xmlns:cx4="http://schemas.microsoft.com/office/drawing/2016/5/10/chartex"
    xmlns:cx5="http://schemas.microsoft.com/office/drawing/2016/5/11/chartex"
    xmlns:cx6="http://schemas.microsoft.com/office/drawing/2016/5/12/chartex"
    xmlns:cx7="http://schemas.microsoft.com/office/drawing/2016/5/13/chartex"
    xmlns:cx8="http://schemas.microsoft.com/office/drawing/2016/5/14/chartex"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:aink="http://schemas.microsoft.com/office/drawing/2016/ink"
    xmlns:am3d="http://schemas.microsoft.com/office/drawing/2017/model3d"
    xmlns:o="urn:schemas-microsoft-com:office:office"
    xmlns:oel="http://schemas.microsoft.com/office/2019/extlst"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
    xmlns:v="urn:schemas-microsoft-com:vml"
    xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing"
    xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
    xmlns:w10="urn:schemas-microsoft-com:office:word"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml"
    xmlns:w15="http://schemas.microsoft.com/office/word/2012/wordml"
    xmlns:w16cex="http://schemas.microsoft.com/office/word/2018/wordml/cex"
    xmlns:w16cid="http://schemas.microsoft.com/office/word/2016/wordml/cid"
    xmlns:w16="http://schemas.microsoft.com/office/word/2018/wordml"
    xmlns:w16sdtdh="http://schemas.microsoft.com/office/word/2020/wordml/sdtdatahash"
    xmlns:w16se="http://schemas.microsoft.com/office/word/2015/wordml/symex"
    xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup"
    xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk"
    xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml"
    xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 w15 w16se w16cid w16 w16cex w16sdtdh wp14">
    <w:body>
        <w:tbl>
            <w:tblPr>
                <w:tblW w:w="0" w:type="auto"/>
                <w:tblLook w:val="04A0" w:firstRow="1" w:lastRow="0" w:firstColumn="1" w:lastColumn="0" w:noHBand="0" w:noVBand="1"/>
            </w:tblPr>
            <w:tblGrid>
                <w:gridCol w:w="2398"/>
            </w:tblGrid>
            <w:tr w:rsidR="003E4BB2" w14:paraId="5EAE5069" w14:textId="77777777" w:rsidTr="003E4BB2">
                <w:tc>
                    <w:tcPr />
                    <w:p w14:paraId="3F9F55E0" w14:textId="02A624C4" w:rsidR="003E4BB2" w:rsidRDefault="00771A74">
                        <w:r>
                            <w:t>{%cellbg 444444 %} test</w:t>
                        </w:r>
                    </w:p>
                </w:tc>
            </w:tr>
        </w:tbl>
        <w:p w14:paraId="53BD41B7" w14:textId="77777777" w:rsidR="0076346F" w:rsidRDefault="0076346F"/>
        <w:sectPr w:rsidR="0076346F">
            <w:footerReference w:type="even" r:id="rId6"/>
            <w:footerReference w:type="default" r:id="rId7"/>
            <w:footerReference w:type="first" r:id="rId8"/>
            <w:pgSz w:w="12240" w:h="15840"/>
            <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720" w:footer="720" w:gutter="0"/>
            <w:cols w:space="720"/>
            <w:docGrid w:linePitch="360"/>
        </w:sectPr>
    </w:body>
</w:document>
elapouya commented 3 weeks ago

Could you provide the 2 .docx template files (Example A & B)

pecanka commented 2 weeks ago

certainly. and I'll do one better and provide all 4 of the cases I was mentioning above. they are the following:

document1.docx      # tags between <w:tcPr> and </w:tcPr>        => works well
document2.docx      # newline between <w:tcPr> and </w:tcPr>     => works well
document3.docx      # nothing between <w:tcPr> and </w:tcPr>     => fails
document4.docx      # shorthand tag <w:tcPr/>                    => fails

and here's the code to run the conversions:

def run_docxtpl(file):
    import jinja2
    from docxtpl import DocxTemplate

    doc = DocxTemplate(file)
    doc.render({}, jinja2.Environment(), autoescape = True)
    doc.save(f'{file}.docx')

run_docxtpl("document1.docx")
run_docxtpl("document2.docx")
run_docxtpl("document3.docx")
run_docxtpl("document4.docx")

document1.docx document2.docx document3.docx document4.docx

and here are the output files I get when I run the above code using docxtpl 0.18.0 on Python 3.10.11

document1.docx.docx document2.docx.docx document3.docx.docx document4.docx.docx

hope this helps