Closed pryder-fleetaero closed 1 year ago
So, this is actually expected/intended behavior. See both https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined_entities_in_XML and https://stackoverflow.com/questions/1328538/how-do-i-escape-ampersands-in-xml-so-they-are-rendered-as-entities-in-html.
When the "
characters are read by the XML parser, they're now treated as "
. If there is XML content being passed in that has nested content, that nested content should either go in a CDATA section or it will have to be sanitized/escaped first.
You can see that JavaScript XML parsing behaves the same way:
const xmlStr = '<root><test badAttribute=\"a"b\"/></root>';
const parser = new DOMParser();
const doc = parser.parseFromString(xmlStr, "application/xml");
doc.querySelector('test');
Ah interesting. I was under the incorrect assumption then that any property etc. that returned an "XML" type representation would still have any XML entities present to be still considered valid XML in the parsing context (underly problem is we take the child element we in on using SWXMLHash and then pass it to an XMLMapper
struct which uses NSXMLParser under the hood which throws an error due to the ‹test badAttribute="a"b"›</test>
representation (it wants the embedded " escaped as "). Essentially we're using SWXMLHash kind of like the .selectSingleNode(xPath)
in many DOM parsers just to cut to the element of interest and then parsing it's XML standalone.
Happy this behaviour is by design and in our case and we've worked around the issue with an XMLElement/XMLAttribute extension to get a representation with the XML entities still included in the attribute values which is then valid when parsed to NSXMLParser(/XMLMapper).
One thing I did note with your example though is if you query the .innerHTML
property (I assume analogous to the .innerXML
of the library), it does include the the XML entity for the escaping:
Interesting, thanks for sharing! I wonder if that is an HTML vs XML difference. To be honest, I wasn't aware of this prior to your submission... I would have guessed text within quotes wouldn't be parsed either, but it does seem consistent. That was being handled by the underlying NSXMLParser itself, though.
I'm currently working on a project which uses SWXMLHash to "shred" the returned XML response from a web service to get to the useful deeply embedded content, at which point it passes that xml string (i.e. the deeply nested xml element of the original response) to another XML parser (i.e.
XMLMapper
) to do actual mapping to various structs etc.The issue we're encountering is if the original has an attribute that has an embedded quote
"
in the correct XML escaped format of"
, i.e.Then SWXMLHash's
XMLElement.innerXML
andString(describing: <XMLElement>)
both return XML where the embedded attribute quote isn't correctly escaped. I.e.To Reproduce Steps to reproduce the behavior:
Or:
Expected behavior
Environment:
Additional context Add any other context about the problem here.