Closed GuntherRademacher closed 1 year ago
This PR is a proposal for fixing this, by making making
IOContent
supply aReader
that will decode the byte array as UTF-8, and have the XML parser ignore the encoding presented in the XML declaration.
My first attempt to fix this was neglecting the fact that the encoding of IOContent
can be different from UTF-8, as in FetchModuleTest.binaryDoc
. Also for some reason I missed to use InputSource.setEncoding
.
I have now replaced it by
IOContent
, in case it was constructed from a String
,InputSource.setEncoding
.…merged (with just some minor changes).
Some time ago I was using a Java tool (can't remember what it was) that generated XML in a Java string starting with an XML declaration of
When passing that via
IO.get
to theDBNode
constructor,it failed with a parsing error, because the string is internally encoded in UTF-8, the resulting byte stream is then passed to the XML parser, which decodes it per the encoding from the XML declaration. At the time I had fixed this in the application by omitting the XML declaration, but later I realized that the same is reproducible by a query like
This PR is a proposal for fixing this, by making making
IOContent
supply aReader
that will decode the byte array as UTF-8, and have the XML parser ignore the encoding presented in the XML declaration.