4teamwork / docxcompose

Append/Concatenate .docx documents
MIT License
104 stars 37 forks source link

Adding core properties does not work and corrupts file #101

Open manuelbastuck opened 10 months ago

manuelbastuck commented 10 months ago

I tried to add core properties (like category, version, ...) to a document with docxcompose. However, the resulting file, when openend with Microsoft Word, seemed to be "corrupted". Word complained about "unreadable content" and suggested to "recover" the file (wording might not be exact, translated from the German warning message). The recovery works and the file is displayed, however, the core properties are not present, neither before nor after.

After some digging, my tentative explanation is that the namespace prefix for custom properties should be "op" (according to here) instead of "cp" which is used in docxcompose. "cp" definitely overwrites the python-docx namespace prefix for "core properties" in utils.py. After I replaced every occurence of "cp" with "op" in utils.py and properties.py, everything seems to work as expected for me.

MWE:

from docx import Document
from docxcompose.composer import Composer

composer = Composer(Document())

# adding any of these (at least) result in a "corrupted" file
composer.doc.core_properties.version = "0"
# composer.doc.core_properties.keywords = "keyword"
# composer.doc.core_properties.category = "category"

composer.save("test.docx")
MiquelBarceloG commented 5 months ago

Yes! I was getting crazy about this.

Funny enough, this generates valid docx files with all correct properties...

from docx import Document

doc=Document()

doc.core_properties.version = "0"
doc.core_properties.keywords = "keyword"
doc.core_properties.category = "category"

from docxcompose.composer import Composer

composer = Composer(doc)
composer.save("test.docx")