Closed m8pple closed 2 years ago
This is a deceptively big deal. At the heart of all this is Lex, which has been in Generics for probably 30 years. Lex accepts c/r as a valid alternative to '"' for closing a quoted string, and I am loathe to change stuff in Generics because the repercussions will ripple out in strange ways. What we would do is to put (yet another) switch into the Lex interface that leaves the behaviour as is by default and allows us a special POETS behaviour that insists on double quotes on either side. This will let is include '\n' as a character inside a string, which I think is the behaviour you want. (But why?)
This issue doesn't really matter at a practical level. XML compliance is not a goal, and Xml generated from other XML languages and generators can be normalised (e.g. collapsing to spaces) if needed.
This is maybe a bit obscure, but it was in one of the test cases developed for v4 parsing. It's also a bit related to the constraints created by the requirement that Metadata become a key value pair, both of which must be in attributes (an alternative proposed was that the value could be in the content, but this was rejected as too complicated).
The test-case anticipates trying to put a large value in a metadata value, which is fairly likely (as some meta-data values are quite bit in existing v3 files):
https://github.com/POETSII/poets_improvement_proposals/blob/2a355282f3aa3912bdacf4ec50bf25199bdf310d/proposed/PIP-0020/xml/ic/tests/valid/L3-compilation/tiniest-plus-metadata3.xml#L3-L9
This is valid XML, though the parser is allowed to normalise the whitespace to spaces. However, it fails in the orchestrator parser:
I can't post the exact version of the orchestrator right now, as I have pending changes to fix #222, #221, and #220 which I don't want to push yet. However, the problem with multi-line attributes must exist in b5615c33a5476c4ed1d1170cbc1abaa4201ad007, as that is the last version I merged, and I haven't touched the XML parser code (only the grammar).
This is probably not that high priority, as it can be worked around, but means that some of the existing test-cases won't pass.