Bouke / docx-mailmerge

Mail merge for Office Open XML (docx) files without the need for Microsoft Office Word.
MIT License
274 stars 104 forks source link

get_merge_fields() returns set() #31

Open retsyo opened 7 years ago

retsyo commented 7 years ago

I attached the code and the docx file. I tested with Python 3.4.4 |Anaconda 2.3.0 (64-bit) on Win7 64 bits

bug.zip

Bouke commented 7 years ago

Just for reference, what would you expect the output to be?

retsyo commented 7 years ago

I hope the title, name and ID fields are filled with "title", "name" and "id" string

Bouke commented 7 years ago

The fields in the document are different from the expected MAILMERGE fields. Your field is defined as:

<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText xml:space="preserve">MERGEFIELD &quot;</w:instrText>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText>name</w:instrText>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText xml:space="preserve">&quot; </w:instrText>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:t>«name»</w:t>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="end"/>
</w:r>

Whereas the format it's looking for is this:

<w:r>
    <w:rPr>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
    <w:rPr>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText xml:space="preserve"/>
</w:r>
<w:r>
    <w:rPr>
        <w:rFonts w:hint="eastAsia"/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText>MERGEFIELD name2 \* MERGEFORMAT</w:instrText>
</w:r>
<w:r>
    <w:rPr>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:instrText xml:space="preserve"/>
</w:r>
<w:r>
    <w:rPr>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
    <w:rPr>
        <w:noProof/>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:t>«name2»</w:t>
</w:r>
<w:r>
    <w:rPr>
        <w:sz w:val="24"/>
    </w:rPr>
    <w:fldChar w:fldCharType="end"/>
</w:r>

So the difference is that your MERGEFIELD is split across three instText nodes:

    <w:instrText xml:space="preserve">MERGEFIELD &quot;</w:instrText>
    <w:instrText>name</w:instrText>
    <w:instrText xml:space="preserve">&quot; </w:instrText>

compared to the usual, which has everything in a single node:

    <w:instrText>MERGEFIELD name2 \* MERGEFORMAT</w:instrText>

I'm not sure why Word decided to write the mailmerge differently, but that's definitely the problem here. You can see the actual XML document if you unzip the bug.docx file and open word/document.xml from the extracted folder. To fix this in this package, it should look for <w:fldChar w:fldCharType="begin"/> and concatenate all contents of the <w:instrText>...</> nodes up until <w:fldChar w:fldCharType="end"/>. There's already something similar in there, which could be extended. If you feel like contributing, I'd welcome a pull request for this! 😀

DavidAStevenson commented 5 years ago

Just for anyone's reference - I encountered this myself, but then realized the issue was simply that I needed to save my Word template in true .docx format, not a legacy MS word 2003 format. Thanks.