Test - and catch - .csv formatting (XML allowed chars)

davidverweij commented 4 years ago

In the original code, I implemented try-except to check if the merge_pages was executed successfully. This was removed at #4.

try:
    docx.merge_pages([single_document])
except ValueError:
    print ("ValueError in document number " + str(counter) + ". Please check the .csv for valid characters. Continuing with the rest...")
except:
    print("Uknown Error")
    raise
else:
     docx.write(str(counter) + ".docx")

The docx.merge_pages() will fail if the .csv contains characters such as ä. See more details in the XML documentation. We should test if this still occurs - and then handle this error gracefully (e.g. try-except).

If this error still occurs, I'd suggest to continue conversion (csv2docx) and return a list of problematic rows in the .csv (perhaps with text output). In the module, this could be a list that convert() would return - which the CLI can print to the monitor.

jawrainey commented 4 years ago

Worth noting that we replaced merge_pages with merge_templates as it's recommended for python 3+. I suspect that this issue was related to encoding, which is notorious for being a pain on python 2, especially UTF-8 characters as above.

To confirm: this issue no longer occurs. I updated the example csv locally to include names with Dävid (yes probs not a real name) and UTF-8 names (유진, سارا) and they work as expected.

We should cover these edge cases in unit tests that utilise our example csv. I'll open a separate issue for unit testing.

davidverweij commented 4 years ago

Just found this note on the merge_templates method in the README:

When using this feature, make sure you don't use comments, footnotes, bookmarks, etc. This is because these elements have an id attribute, which must be unique. This library does not handle this, resulting in invalid documents.

Worth checking if this causes issues in our current use case for merging fields for a single document, and future use case to merge everything in one big document (#32)

davidverweij / csv2docx

Test - and catch - .csv formatting (XML allowed chars) #17