FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
574 stars 222 forks source link

XML attributes are not correctly populated when converting from XML (ObjectNode) to String using writeValueAsString (XML structure) #613

Open forceporquillo opened 1 year ago

forceporquillo commented 1 year ago

Hi @cowtowncoder,

We are planning to migrate our old deserialization logic from using fragmented substrings between <open> and </closing> tags when deserializing an XML String to POJOs into Jackson’s sophisticated custom StdDeserializers/JsonDeserializers directly starting from version 2.15.2.

I have a few questions about whether this feature of mine already exists or not.

My team and I are trying to deserialize a very complex XML structure, to the point that we do not have to create multiple beans because some of the content of our XML subtree often changes dynamically with minimal format structure (we have ~10 XML formats to support with or without attributes and namespaces) and is also a valid XML structure. WE mostly used custom deserializers like StdDeserializer and JsonDeserializer to map ObjectNode into plain Map (well internally, this is already a Map) (thanks for your suggestion in #574) and then later converted it back to String.

While I still stumbled across our implementation to keep our XML subtree to be deserialized via substring. But then later we planned to stick to using JsonNode to retrieve segments of XML and convert it back to the original XML as String using SegmentedStringWriter#writeValueAsString, for a reason that we wanted to leverage the java.xml.stream.* without the need to retrieve full string token and to fully adapt Jackson's Streaming API interoperability. We do not want to convert it into a plain POJO, but instead, would rather use the java.util.Map class to keep it generic and to store every JsonNode token retrieved from ObjectNode#getValue().

Sample Scenario

Supposedly, I have an XML element that contains an attribute name and age.

<animal>
  <dog name="tucker" age="2">woof woof!</dog>
  <cat name="cody" age="2">meowwwwwww!</cat>
  <chicken name="browny" age="3">cluck cluck!</chicken>
  <pig name="porky" age="4">oink oink!</pig>
</animal>

XML Attribute

The output I would expect should be the same as the original XML, but in my case, it's not:

<animal>
  <dog>
    <name>tucker</name>
    <age>2</age>
    <>woof woof!</>
  </dog>
  <cat>
    <name>cody</name>
    <age>2</age>
    <>meowwwwwww!</>
  </cat>
  <chicken>
    <name>browny</name>
    <age>3</age>
    <>cluck cluck cluck cluck!</>
  </chicken>
  <pig>
    <name>porky</name>
    <age>4</age>
    <>oink oink!</>
  </pig>
</animal>

The attributes became sub-elements of the original element, while the element tag of the original element value becomes empty: <>...</>

XML Namespace

The same happens with namespaces: suppose Animal namespace declaration: <ns1:animal xmlns:ns1="urn:this:is:namespace:for:animal">

Output:

<animal>
  <xmlns:ns1>urn:this:is:namespace:for:animal</xmlns:ns1>
  <ns1:dog>
    <name>tucker</name>
    <age>2</age>
    <>woof woof!</>
  </ns1:dog>
  <!-- ... -->
</animal>

The namespace becomes a new element.

My assumption is that it will only work with a plain XML structure (excluding the attributes itself) or with bean-defined POJOs annotated XML attr properties.

Sampe Driver Code

Tried using custom XmlMapper config:

private static XMLInputFactory xmlInputFactory() {
  final XMLInputFactory factory = XMLInputFactory.newInstance();
  factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
  return factory;
}

OBJECT_WRITER = new XmlMapper(xmlInputFactory(), customJacksonXmlModule())
   .configure(ToXmlGenerator.Feature.WRITE_XML_DECLARATION, true)
   .enable(SerializationFeature.INDENT_OUTPUT)
   .writer();
JsonNode node = mapper.readTree(
 "<animal>\n"
 + "  <dog name=\"tucker\" age=\"2\">woof woof!</dog>\n"
 + "  <cat name=\"cody\" age=\"2\">meowwwwwww!</cat>\n"
 + "  <chicken name=\"browny\" age=\"3\">cluck cluck!</chicken>\n"
 + "  <pig name=\"porky\" age=\"4\">oink oink!</pig>\n"
 + " </animal>");

System.out.println(mapper.writer().withDefaultPrettyPrinter().withRootName("animal").writeValueAsString(node));

and with a custom deserializer for the Animal bean.

@JsonDeserialize(using = AnimalDeserializer.class)
class Animal {
 String xmlContent; // <-- JsonNode to String: <dog name="tucker" age="2">woof woof!</dog>...
}

AnimalDeserializer.java

@Override
public T deserialize(JsonParser parser, DeserializationContext context) throws IOException {
  final ObjectCodec codec = parser.getCodec();
  final ObjectNode objectNode = codec.readTree(parser);

  final T object = codec.readValue(objectNode.traverse(), getValueType(context));
  final Iterator<Entry<String, JsonNode>> nodeIterator = objectNode.fields();

  while (nodeIterator.hasNext()) {
    Entry<String, JsonNode> nodeEntry = nodeIterator.next();
    // we manually set the value for non-annotated @JsonProperty() fields, let Jackson take care for us!!
    if (!attributes().contains(nodeEntry.getKey())) {
      if (nodeEntry.getValue() instanceof ObjectNode) {
        object.setContent(readObjectNodeAsString(nodeEntry));
      } 
      ...
    }
  }

  return object;
}

private String readObjectNodeAsString(Entry<String, JsonNode> nodeEntry) {
  try {
    return OBJECT_WRITER
        .withRootName(nodeEntry.getKey())
        .writeValueAsString(nodeEntry.getValue());
  }
  ...
}

Update:

When using TypeReference<Map<String, String>(){}, it prints the expected XML output. However, the attribute is still missing.

Map<String, String> node = XML_MAPPER.readValue("...", new TypeReference<Map<String, String>>() {});

Output:

<?xml version='1.0' encoding='UTF-8'?>
<animal>
  <dog>woof woof!</dog>
  <cat>meowwwwwww!</cat>
  ...

We really appreciate your feedback and suggestions for this @cowtowncoder

neshant commented 9 months ago

Hi @cowtowncoder,

We are planning to migrate our old deserialization logic from using fragmented substrings between <open> and </closing> tags when deserializing an XML String to POJOs into Jackson’s sophisticated custom StdDeserializers/JsonDeserializers directly starting from version 2.15.2.

I have a few questions about whether this feature of mine already exists or not.

I am trying to deserialize a very complex XML structure, to the point that we do not have to create multiple beans because some of the content of our XML subtree often changes dynamically with minimal format structure (we have ~10 XML formats to support with or without attributes and namespaces) and is also a valid XML structure. I mostly use a custom deserializer StdDeserializer/JsonDeserializer to map ObjectNode into plain Map (well internally, this is already a Map) (thanks for your suggestion in #574) and then later convert it into a String.

While I still stumbled across our implementation to keep our XML subtree to be deserialized via substring. But then later we plan to stick to using JsonNode to retrieve segments of XML and convert it back to the original XML as String using SegmentedStringWriter#writeValueAsString, for a reason that we wanted to leverage the java.xml.stream.* without the need to retrieve full string token and to fully adapt Jackson's Streaming API interoperability. Basically, we do not want to convert it into a plain POJO, but instead, would rather use the java.util.Map class to keep it generic and to store every JsonNode token retrieved from ObjectNode#getValue().

Sample Scenario

Supposedly, I have an XML element that contains an attribute name and age.

<animal>
  <dog name="tucker" age="2">woof woof!</dog>
  <cat name="cody" age="2">meowwwwwww!</cat>
  <chicken name="browny" age="3">cluck cluck!</chicken>
  <pig name="porky" age="4">oink oink!</pig>
</animal>

XML Attribute

The output I would expect should be the same as the original XML, but in my case, it's not:

<animal>
  <dog>
    <name>tucker</name>
    <age>2</age>
    <>woof woof!</>
  </dog>
  <cat>
    <name>cody</name>
    <age>2</age>
    <>meowwwwwww!</>
  </cat>
  <chicken>
    <name>browny</name>
    <age>3</age>
    <>cluck cluck cluck cluck!</>
  </chicken>
  <pig>
    <name>porky</name>
    <age>4</age>
    <>oink oink!</>
  </pig>
</animal>

The attributes became sub-elements of the original element, while the element tag of the original element value becomes empty: <>...</>

XML Namespace

The same happens with namespaces: suppose Animal namespace declaration: <ns1:animal xmlns:ns1="urn:this:is:namespace:for:animal">

Output:

<animal>
  <xmlns:ns1>urn:this:is:namespace:for:animal</xmlns:ns1>
  <ns1:dog>
    <name>tucker</name>
    <age>2</age>
    <>woof woof!</>
  </ns1:dog>
  <!-- ... -->
</animal>

The namespace becomes a new element.

My assumption is that it will only work with a plain XML structure (excluding the attributes itself) or with bean-defined POJOs annotated XML attr properties.

Sampe Driver Code

Tried using custom XmlMapper config:

private static XMLInputFactory xmlInputFactory() {
  final XMLInputFactory factory = XMLInputFactory.newInstance();
  factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, false);
  return factory;
}

OBJECT_WRITER = new XmlMapper(xmlInputFactory(), customJacksonXmlModule())
   .configure(ToXmlGenerator.Feature.WRITE_XML_DECLARATION, true)
   .enable(SerializationFeature.INDENT_OUTPUT)
   .writer();
JsonNode node = mapper.readTree(
 "<animal>\n"
 + "  <dog name=\"tucker\" age=\"2\">woof woof!</dog>\n"
 + "  <cat name=\"cody\" age=\"2\">meowwwwwww!</cat>\n"
 + "  <chicken name=\"browny\" age=\"3\">cluck cluck!</chicken>\n"
 + "  <pig name=\"porky\" age=\"4\">oink oink!</pig>\n"
 + " </animal>");

System.out.println(mapper.writer().withDefaultPrettyPrinter().withRootName("animal").writeValueAsString(node));

and with a custom deserializer for the Animal bean.

@JsonDeserialize(using = AnimalDeserializer.class)
class Animal {
 String xmlContent; // <-- JsonNode to String: <dog name="tucker" age="2">woof woof!</dog>...
}

AnimalDeserializer.java

@Override
public T deserialize(JsonParser parser, DeserializationContext context) throws IOException {
  final ObjectCodec codec = parser.getCodec();
  final ObjectNode objectNode = codec.readTree(parser);

  final T object = codec.readValue(objectNode.traverse(), getValueType(context));
  final Iterator<Entry<String, JsonNode>> nodeIterator = objectNode.fields();

  while (nodeIterator.hasNext()) {
    Entry<String, JsonNode> nodeEntry = nodeIterator.next();
    // we manually set the value for non-annotated @JsonProperty() fields, let Jackson take care for us!!
    if (!attributes().contains(nodeEntry.getKey())) {
      if (nodeEntry.getValue() instanceof ObjectNode) {
        object.setContent(readObjectNodeAsString(nodeEntry));
      } 
      ...
    }
  }

  return object;
}

private String readObjectNodeAsString(Entry<String, JsonNode> nodeEntry) {
  try {
    return OBJECT_WRITER
        .withRootName(nodeEntry.getKey())
        .writeValueAsString(nodeEntry.getValue());
  }
  ...
}

Update:

When using TypeReference<Map<String, String>(){}, it prints the expected XML output. However, the attribute is still missing.

Map<String, String> node = XML_MAPPER.readValue("...", new TypeReference<Map<String, String>>() {});

Output:

<?xml version='1.0' encoding='UTF-8'?>
<animal>
  <dog>woof woof!</dog>
  <cat>meowwwwwww!</cat>
  ...

We really appreciate your feedback and suggestions for this @cowtowncoder

I am also observing the same behavior , all my attributes are converted to sub elements

neshant commented 9 months ago

https://github.com/FasterXML/jackson-module-jaxb-annotations/issues/27 This should help you if not tried already

https://stackoverflow.com/questions/22433679/how-to-use-jacksons-jaxbannotationintrospector-correctly

@forceporquillo

cowtowncoder commented 9 months ago

Some quick notes:

  1. "Attribute-ness" is not retained by JsonNode (no place for that, not xml aware)
  2. Namespace information is not retained by JsonNode (but theoretically perhaps could)

It almost feels like a XML-specific subset of JsonNodes should be created, and that could allow solving the problems. But as things are, there are some big limitations.

There are other possible feature additions that could help: for example, use of naming convention or wrappers could allow something like prefixing names of attributes with @ when reading from XML into Java objects (including JsonNode), and then removing prefix on serialization, using it as attribute indicator. Exactly how it would go would require work obviously; there are no concrete plans to do that but it seems like that would be doable -- something similar to this was added for handling of xsi:type auto-detection (see #324 / #634)