FasterXML / jackson-dataformat-xml

Extension for Jackson JSON processor that adds support for serializing POJOs as XML (and deserializing from XML) as an alternative to JSON
Apache License 2.0
567 stars 221 forks source link

Empty list incorrectly deserialized when `ACCEPT_SINGLE_VALUE_AS_ARRAY` is enabled #513

Closed yawkat closed 2 years ago

yawkat commented 2 years ago

When ACCEPT_SINGLE_VALUE_AS_ARRAY is enabled, an empty List<String> is incorrectly deserialized as a list containing a single empty string (List.of("")). Test case:

import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.core.type.TypeReference;
import com.fasterxml.jackson.databind.DeserializationFeature;
import com.fasterxml.jackson.dataformat.xml.XmlMapper;
import org.junit.Assert;
import org.junit.Test;

import java.util.List;

public class EmptyElementTest {
    @Test
    public void test() throws JsonProcessingException {
        XmlMapper mapper = new XmlMapper();
        mapper.enable(DeserializationFeature.ACCEPT_SINGLE_VALUE_AS_ARRAY);
        List<String> list = mapper.readValue("<values/>", new TypeReference<List<String>>() {});
        Assert.assertTrue(list.isEmpty());
    }
}

This is technically correct behavior, because <values/> can be deserialized as an empty string, which can then be wrapped in a list. However imo this should still be fixed.

Looking at the code, the issue is in databind StringCollectionDeserializer. For the test case, isExpectedStartArrayToken returns false, which triggers the handleNonArray logic. This logic checks for the ACCEPT_SINGLE_VALUE_AS_ARRAY first, before trying to coerce the empty string to a list, which would succeed here.

I see two approaches to fix this. Either change FromXmlParser.isExpectedStartArrayToken to return true for an empty string token, or change StringCollectionDeserializer to attempt a coercion from empty string before wrapping the value. imo the former should work fine.

I can work on a fix for this once my corp CLA is cleared, which will be soon(tm).

cowtowncoder commented 2 years ago

Ok, I don't have a strong opinions here but...

I do not think FromXmlParser.isExpectedStartArrayToken should return true for an empty String token -- this is not a general purpose coercion that should be exposed. It would probably break many other things; and ACCEPT_SINGLE_VALUE_AS_ARRAY specifically should not expect to see Array container -- if that exists, this feature should be ignored. So it would be (IMO) semantically Wrong thing to do, all around.

As to StringCollectionDeserializer, I am not sure that approach would work any better: it cannot know anything about special cases of XML handling, unless XML module overrides it (and other similar deserializers).

Beyond this, I think a key question here is this: does

<values></values>

(which is semantically same XML as <values /> -- handling should be identical)

correctly bind to String.class, as ""? If so, it seems it really should become a List with one element, ""; coercion from a single value into List/Collection/array with that element.

Anyway; I am open to making the change as I can see how/why users would prefer this (especially since I think serialization would produce such output), but bit concerned about feasibility of making it work.

yawkat commented 2 years ago

Yes, <values/> binds as "" when you bind it to String.class.

Long comment on my thoughts on this issue:

imo removing an element from an xml list should always bind to one less element in the deserialized form as well, to be consistent. So basically:

This is after all also the behavior when ACCEPT_SINGLE_VALUE_AS_ARRAY is off.

However it would be a bit weird when getting to this from the other side:

So basically the question comes down to: with ACCEPT_SINGLE_VALUE_AS_ARRAY, do we prioritize the behavior of the single-element code path, or the behavior of the multi-element code path? Whichever code path we prioritize, the other will have an odd inconsistency.

Now, there are a few arguments in favor of prioritizing the multi-element code path:

LMK what you think.


About the isExpectedStartArrayToken approach to changing this: imo <values></values> is a valid empty array, so it should return true when the deserializer expects an array. This is not without precedent: isExpectedNumberIntToken for example returns true if the string value is a valid number, even though the value might also be parsed as a string.

However I can see how changing isExpectedStartArrayToken could have lots of unintended consequences outside of this specific bug, so it may not be safe to do. Also CollectionDeserializer seems to be doing fine without a change to isExpectedStartArrayToken, which I hadn't considered when writing this bug report. So changing StringCollectionDeserializer to better follow the behavior of CollectionDeserializer seems like the safest option.

cowtowncoder commented 2 years ago

Ok, I wish I had time to properly focus on this one... but I'll add some notes at any rate, hoping they can help. So, on isExpectedStartArrayToken: I don't think this would really help with ACCEPT_SINGLE_VALUE_AS_ARRAY since it "converts" element to "[ ]", and if that happens this feature would not come into play at all, unless I miss something. I am NOT against elements getting coerced per se: if caller calls this method it does expect an array. And so I am surprised if that did not already actually get coerced.

I guess I am open to this change, if you want to try a PR? Obviously would need to go in 2.14 due to compatibility concerns. Semantically, yes, it's bit of an ambiguous case, but I do agree with you that "empty String" (or even all-whitespace) can/should most of the time be considered missing/null: unless explicitly bound as String. So I think I am +1 for your originally expected end result. :)

One other random note: this is for "root value"; I wonder if handling is different (... probably is) for regular property values. I would probably argue that root-level handling (root values) are something that are particularly tricky to get right.