FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
573 stars 222 forks source link

Support `CharacterEscapes` using Stax (Woodstox/Aalto) functionality #75

Open cowtowncoder opened 11 years ago

cowtowncoder commented 11 years ago

Currently CharacterEscapes system does not work for XML module, mostly because module has no direct control over output escaping. However, Stax2 extension that Woodstox (and Aalto I think?) implements does have similar functionality, using properties:

so it would be great to use that functionality to support needs for customized escapes, if at all possible. It is hard to say for sure whether that would work, but it should be easy enough to check.

cowtowncoder commented 8 years ago

Ok. So two approaches I think, depending on how fancy this should be:

  1. Simply allow registration of Stax2 EscapingWriterFactory, which allows user to implement escaping logic. More work for users, not much for Jackson
  2. Implement EscapingWriterFactory that can use CharacterEscapes for determination. Much nicer from user perspective, but more work and possibly bit more overhead since 2 different interfaces need to be adapted, which may have additional impedance.
dabulashvili-zz commented 6 years ago

Is there any example of configuring EscapingWriterFactory?

cowtowncoder commented 6 years ago

Unfortunately I can't find anything right now, except for actual unit test from Woodstox:

src/test/java/org/codehaus/stax/test/wstream/CharacterEscapingTest.java

which should show the idea. I should write a blog post one of these days, as I have written something about Woodstox in general (even if it's about 10 years since I actively worked with XML :) )

mvysny commented 4 years ago

This is ridiculous, ' and " should have been escaped automatically by default. Workaround is to implement EscapingWriterFactory:

https://stackoverflow.com/questions/56799368/escaping-quotes-using-jackson-dataformat-xml

cowtowncoder commented 4 years ago

@mvysny uh? I do not appreciate tone of the comment: especially given that you give no context on WHERE single/double-quotes are not escaped where they should be. As far as I know they are properly escaped in XML content as per XML specification. I am also not sure how this relates to issue at hand, which is about whether it would be possible to map Jackson feature into native Woodstox mechanism (which indeed could be mechanism you reference).

mvysny commented 4 years ago

@cowtowncoder I apologize for the tone of my comment, I was mad by things not going as I expected, but of course that is no excuse. Thank you for your replies and for your hard work, I really appreciate it :+1:

When a POJO with text contents is serialized to XML with the default settings using Jackson's XmlMapper, the " and ' are not escaped to &quot; and &apos;. I thought the escaping was mandated by the XML spec, but I was wrong - it's not mandated. Still, I thought there would be a simple setting to always escape the five characters (<>&'") somewhere in XmlMapper; I was surprised to find that one needs to write EscapingWriterFactory to have those characters escaped.

Implementing EscapingWriterFactory is not an easy feat - it would be great to either have a documentation for that, or to have some simpler way of setting a set of chars which need to be escaped.

cowtowncoder commented 4 years ago

Ah no problem. I know the feeling. :-)

But yes, it would be great to connect the functionality via Jackson API. And you are right, implement EscapingWriterFactory is not super easy; the best (only?) example I know of is at:

src/test/java/org/codehaus/stax/test/wstream/CharacterEscapingTest.java

of woodstox-core. I think I was hoping to write something more as part of

https://medium.com/@cowtowncoder/configuring-woodstox-xml-parser-stax2-properties-c80ef5a32ef1

but did not end up doing that, since I haven't had need to actually use it myself. ... and apparently others haven't either, for what that's worth (based on lack of Google hits).

mvysny commented 4 years ago

I've found one which worked for me here: https://stackoverflow.com/questions/56799368/escaping-quotes-using-jackson-dataformat-xml

I've trimmed it down a bit and converted to Kotlin, should help others too: (uses commons-lang3)

class CustomXmlEscapingWriterFactory : EscapingWriterFactory {
    override fun createEscapingWriterFor(out: Writer, enc: String?): Writer = object : Writer() {
        override fun write(cbuf: CharArray, off: Int, len: Int) {
            StringEscapeUtils.ESCAPE_XML.translate(String(cbuf, off, len), out)
        }
        override fun flush() = out.flush()
        override fun close() = out.close()
    }

    override fun createEscapingWriterFor(out: OutputStream?, enc: String?): Writer =
            throw IllegalArgumentException("not supported")
}
cowtowncoder commented 4 years ago

Excellent! Thank you for sharing this.