Closed niconoe closed 7 years ago
After discussion, it appears that, the metadata being an XML file, the encoding should be specified with an XML declaration at the start of the file, or should default to UTF-8 if nothing is specified.
I added two test cases (one with explicit windows-1252 encoding, and one for implicit UTF-8) and slightly changed to code so the XML parser manage this by itself.
Fixed confirmed to work on Windows by @DimEvil. Closing.
(issue reported by @DimEvil, problematic archive for tests: dwca-modirisk-monitoring-2-v3.5.zip)
When reading EML.xml, the file encoding is currently not specified, so Python use a system-dependent codec. This is generally utf-8 on *nix, but it is CP1252 on Windows (Python 3, installed from Anaconda) which triggers:
Next steps to fix:
open