Closed yawkat closed 2 years ago
Ok, I don't have a strong opinions here but...
I do not think FromXmlParser.isExpectedStartArrayToken
should return true
for an empty String token -- this is not a general purpose coercion that should be exposed. It would probably break many other things; and ACCEPT_SINGLE_VALUE_AS_ARRAY
specifically should not expect to see Array container -- if that exists, this feature should be ignored.
So it would be (IMO) semantically Wrong thing to do, all around.
As to StringCollectionDeserializer
, I am not sure that approach would work any better: it cannot know anything about special cases of XML handling, unless XML module overrides it (and other similar deserializers).
Beyond this, I think a key question here is this: does
<values></values>
(which is semantically same XML as <values />
-- handling should be identical)
correctly bind to String.class
, as ""? If so, it seems it really should become a List
with one element, ""; coercion from a single value into List/Collection/array with that element.
Anyway; I am open to making the change as I can see how/why users would prefer this (especially since I think serialization would produce such output), but bit concerned about feasibility of making it work.
Yes, <values/>
binds as ""
when you bind it to String.class
.
Long comment on my thoughts on this issue:
imo removing an element from an xml list should always bind to one less element in the deserialized form as well, to be consistent. So basically:
<values><values>x</values><values>y</values></values>
-> ["x", "y"]
<values><values>x</values></values>
-> ["x"]
<values></values>
-> []
<-- with my proposed change.This is after all also the behavior when ACCEPT_SINGLE_VALUE_AS_ARRAY
is off.
However it would be a bit weird when getting to this from the other side:
<values>fo</values>
-> ["fo"]
<values>f</values>
-> ["f"]
<values></values>
-> []
<-- with my proposed change, this would be inconsistent with the previous two deserializations.So basically the question comes down to: with ACCEPT_SINGLE_VALUE_AS_ARRAY
, do we prioritize the behavior of the single-element code path, or the behavior of the multi-element code path? Whichever code path we prioritize, the other will have an odd inconsistency.
Now, there are a few arguments in favor of prioritizing the multi-element code path:
[""]
will serialize to <values><values></values></values>
, and []
will serialize to <values></values>
, so by prioritizing the multi-element code path, we will always get the same value back that we serialized. However this is only true if UNWRAP_SINGLE_VALUE_ARRAYS
is off – it is off by default for us with the micronaut framework, but I don't know how it is for other jackson users.ACCEPT_SINGLE_VALUE_AS_ARRAY
is enabled. However even with my proposed change, <values><values></values></values>
would still be a valid way to deserialize to [""]
.String
with StringBuilder
in my test case – suddenly it passes, the input deserializes as []
. Imo StringCollectionDeserializer
should have the same behavior as the normal CollectionDeserializer
.LMK what you think.
About the isExpectedStartArrayToken
approach to changing this: imo <values></values>
is a valid empty array, so it should return true
when the deserializer expects an array. This is not without precedent: isExpectedNumberIntToken
for example returns true if the string value is a valid number, even though the value might also be parsed as a string.
However I can see how changing isExpectedStartArrayToken
could have lots of unintended consequences outside of this specific bug, so it may not be safe to do. Also CollectionDeserializer
seems to be doing fine without a change to isExpectedStartArrayToken
, which I hadn't considered when writing this bug report. So changing StringCollectionDeserializer
to better follow the behavior of CollectionDeserializer
seems like the safest option.
Ok, I wish I had time to properly focus on this one... but I'll add some notes at any rate, hoping they can help.
So, on isExpectedStartArrayToken
: I don't think this would really help with ACCEPT_SINGLE_VALUE_AS_ARRAY since it "converts" element to "[ ]", and if that happens this feature would not come into play at all, unless I miss something.
I am NOT against elements getting coerced per se: if caller calls this method it does expect an array. And so I am surprised if that did not already actually get coerced.
I guess I am open to this change, if you want to try a PR? Obviously would need to go in 2.14 due to compatibility concerns.
Semantically, yes, it's bit of an ambiguous case, but I do agree with you that "empty String" (or even all-whitespace) can/should most of the time be considered missing/null: unless explicitly bound as String
.
So I think I am +1 for your originally expected end result. :)
One other random note: this is for "root value"; I wonder if handling is different (... probably is) for regular property values. I would probably argue that root-level handling (root values) are something that are particularly tricky to get right.
When
ACCEPT_SINGLE_VALUE_AS_ARRAY
is enabled, an emptyList<String>
is incorrectly deserialized as a list containing a single empty string (List.of("")
). Test case:This is technically correct behavior, because
<values/>
can be deserialized as an empty string, which can then be wrapped in a list. However imo this should still be fixed.Looking at the code, the issue is in databind
StringCollectionDeserializer
. For the test case,isExpectedStartArrayToken
returns false, which triggers thehandleNonArray
logic. This logic checks for theACCEPT_SINGLE_VALUE_AS_ARRAY
first, before trying to coerce the empty string to a list, which would succeed here.I see two approaches to fix this. Either change
FromXmlParser.isExpectedStartArrayToken
to returntrue
for an empty string token, or changeStringCollectionDeserializer
to attempt a coercion from empty string before wrapping the value. imo the former should work fine.I can work on a fix for this once my corp CLA is cleared, which will be soon(tm).