Closed tkalmar closed 4 years ago
Looks like it might be not hex-encoding the values.
No it is hex encoding the value, but the 🩳
is in its encoded form (��
) not valid for XML (see the linked spec) I'm not sure if it is legal in unencoded form for UTF-8 encoded xml files
The emoji is encoded in the Java String as two char
s due to the code point being over the two byte limit, hence the strange encoded form. It's called a surrogate pair. The fix would be to loop over the code points instead of the characters of the string. I'll make a PR in a bit.
StringRenderer
xml-encode
leads to invalid xmlfor example the String
🩳
leads to the XML-Entity:��
which is according to XML-Spec (https://www.w3.org/TR/REC-xml/#NT-Char) not in the range of allowed chars for XML (Note the ranges differ between XML 1.0 and XML 1.1) This should either be:Im not shure if for an UTF-8 encoded XML document the encoding of
<
,>
,"
and'
would be sufficent and all other characters should be passed through