FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
562 stars 221 forks source link

Deserialization requires knowledge of fixed namespace; prevents robust/future-proofing coding #532

Closed transentia closed 2 years ago

transentia commented 2 years ago

I THINK this is a Request for Enhancement, rather than a true BUG report...

I am doing deserialization currently via:

@JsonIgnoreProperties(ignoreUnknown = true)
@JacksonXmlRootElement(localName = "cus:CustomerData")
public record CustomerData(
        @JsonProperty("cus:HouseNumber") String houseNumber,
        @JsonProperty("cus:StreetName") String streetName,
        ... etc.

and within a StdDeserializer, I use stuff like:

jsonNode.get("cus:PhoneNumber")

both of these assume that:

Neither of these is true: the 'downstream' dev team may change things up at any time, so 'cus:XXX' may suddenly be sent to me as 'customerStuff:XXX' or the toolset vendor may change things around so that it becomes 'ns990875:XXX'. For whatever reason, this is all beyond my control.

It's hard to write robust deserializing code...

SO: I want to ignore the prefix.

I have a work-around for my StdDeserializer:

    private static JsonNode fieldEndingWith(JsonNode v, String suffix) {
        Iterator<Map.Entry<String, JsonNode>> fields = v.fields();
        while (fields.hasNext()) {
            Map.Entry<String, JsonNode> next = fields.next();
            if (next.getKey().endsWith(suffix))
                return next.getValue();
        }

        // should never get here...assuming a perfect world;-) but need to do something...
        // this isn't STRICTLY correct, but it works out and keeps all the rest of the code simple
        return v;
    }

I can replace jsonNode.get("cus:PhoneNumber") with fieldEndingWith(jsonNode, "PhoneNumber")

I can't do anything to 'tune' JsonProperty, etc.

So my RFE is: add an "ignorePrefix=true|false" option to JsonProperty and possibly JacksonXmlRootElement/JsonRootName.

I can see that this might cause problems with code like:

<x>
  <a:y />
  <b:y />
</x>

but one would presumably not be wanting to ignore namespacing in this situation (although evolving 'stuff' remains problematic).

cowtowncoder commented 2 years ago

Quick note: use of cus:CustomerData" for localName is invalid: this should fail: local names can not contain colons as they are to be used to separate possible namespace prefix and local name.

Also note that JsonNode does not support storing, changing or writing namespace information. It only deals with XML local names in "empty" namespace (for attributes meaning no prefix; for elements no prefix and default namespace not bound to any namespace URI).

So you cannot really manipulate namespaces XML content with JsonNode: this is not supported at all.

I hope to figure out how to error on attempts to use invalid local names; colon is just one invalid character. Underlying Stax implementation (like Woodstox) should catch this on write but perhaps it does not under some conditions.

transentia commented 2 years ago

Thanks for the heads-up on illegal use of ':'. Elsewhere I was using ':ElementName' but there's no requirement for me to use it, I just thought it looked clearer. I can/will remove it.

Also note that I have ONLY been deserializing, so I'm not giving woodstox a chance to complain.

IF I read what you say correctly, is not the trouble that: while JsonNode (and @JsonProperty, etc.) may be treating the namespace as 'empty' (as distinct from ""), it then treats the LocalName as 'cus:CustomerData', rather than 'CustomerData'?

cowtowncoder commented 2 years ago

Right, nothing at Jackson level handles colon-separated names specifically: that is for XML library to do. Unfortunately databind is mostly namespace-agnostic; XML module does add a bit of support to make sure writing of XML will produce proper namespace-bindings (if scoped names, with annotations, are used), but deserializer will happily just ignore namespace bindings at this point. This is something that could perhaps be solved in Jackson 3.0 (and 2.x has some underlying support for keeping/passing possibly namespace-containining PropertyNames (instead of plain Strings) for databind-level constructs). But as for now, namespace support is rather weak.

transentia commented 2 years ago

So I've now adopted a hack/work-around: strip the prefixes from the xml before deserialization. Works for my specific situation, which has simple documents. Not a general solution, though.

cowtowncoder commented 2 years ago

Ok, at least you have a work-around.

But I am curious: XML parsers typically do not / should not expose prefixed names, but rather something like QName, in which namespace URI + local-name are the key part, and prefix is included as FYI. So I am kind of wondering why prefix:localName was exposed. I guess one can run (some) parsers in "namespace unaware" mode in which local name is left as-is.

cowtowncoder commented 2 years ago

Closing this as "won't close" only because description is bit specific to no-namespace usage, and since I think there are other issues related to problems with namespace usage (that XML module essentially ignores namespace URI when matching elements and attributes, only relying on XML local names).