Open davidverweij opened 4 years ago
Worth noting that we replaced merge_pages
with merge_templates
as it's recommended for python 3+. I suspect that this issue was related to encoding, which is notorious for being a pain on python 2, especially UTF-8 characters as above.
To confirm: this issue no longer occurs. I updated the example csv locally to include names with Dävid
(yes probs not a real name) and UTF-8 names (유진, سارا) and they work as expected.
We should cover these edge cases in unit tests that utilise our example csv. I'll open a separate issue for unit testing.
Just found this note on the merge_templates
method in the README:
When using this feature, make sure you don't use comments, footnotes, bookmarks, etc. This is because these elements have an id attribute, which must be unique. This library does not handle this, resulting in invalid documents.
Worth checking if this causes issues in our current use case for merging fields for a single document, and future use case to merge everything in one big document (#32)
In the original code, I implemented try-except to check if the merge_pages was executed successfully. This was removed at #4.
The docx.merge_pages() will fail if the .csv contains characters such as
ä
. See more details in the XML documentation. We should test if this still occurs - and then handle this error gracefully (e.g. try-except).If this error still occurs, I'd suggest to continue conversion (csv2docx) and return a list of problematic rows in the .csv (perhaps with text output). In the module, this could be a list that convert() would return - which the CLI can print to the monitor.