eea / odfpy

API for OpenDocument in Python
GNU General Public License v2.0
308 stars 63 forks source link

odfpy happily inserts illegal xml characters into documents #71

Open risicle opened 6 years ago

risicle commented 6 years ago

Beyond escaping, certain ranges of characters are not allowed in xml documents (even if escaped by character code). These are not even allowed in CDATA sections, and the range is even more restrictive due to the use of XML 1.0 (over 1.1).

See https://en.wikipedia.org/wiki/Valid_characters_in_XML

I'm unsure what the right thing for odfpy to do would be when encountering one of these characters. Silently removing the characters seems like bad behaviour. But maybe it should at least raise a ValueError rather than generating an invalid document, leaving the responsibility of stripping/replacing them with the application.

Edit: Though, thinking about this further, it does feel like something that should be quite transparent for an odfpy user - the fact that odf is xml at all is an implementation detail, and a user shouldn't expect to have to do xml escaping to text you're trying to write to an odf.