FasterXML / jackson-databind

General data-binding package for Jackson (2.x): works on streaming API (core) implementation(s)
Apache License 2.0
3.52k stars 1.38k forks source link

Polymorphic subtype deduction behaves differently based on order of fields #3183

Open thanaParis opened 3 years ago

thanaParis commented 3 years ago

Describe the bug I'm using @JsonTypeInfo(use = Id.DEDUCTION) and have created a scenario where the mapper behaves differently depending on how I order my fields, which I think should be undesirable behavior.

Version information 2.12.3

To Reproduce JUnit 5 test:

import static org.junit.jupiter.api.Assertions.assertTrue;

import org.junit.jupiter.api.Test;

import com.fasterxml.jackson.annotation.JsonSubTypes;
import com.fasterxml.jackson.annotation.JsonTypeInfo;
import com.fasterxml.jackson.annotation.JsonTypeInfo.Id;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.json.JsonMapper;

public class DeductionBasedPolymorphismFieldOrderTest {

    @Test
    public void test1() throws JsonMappingException, JsonProcessingException {
        String json = "{\"kingdom\":\"animal\",\"woof\": \"woof.wav\",\"meow\": \"meow.wav\",\"arf\": \"arf.wav\",\"bark\": \"bark.wav\"}";
        Animal animal = new JsonMapper().readValue(json, Animal.class); // Succeeds!
        assertTrue(animal.getClass().getName().contains("CatDog"));
    }

    @Test
    public void test2() throws JsonMappingException, JsonProcessingException {
        String json = "{\"kingdom\":\"animal\",\"meow\": \"meow.wav\",\"woof\": \"woof.wav\",\"arf\": \"arf.wav\",\"bark\": \"bark.wav\"}";
        Animal animal = new JsonMapper().readValue(json, Animal.class);
        // Throws: UnrecognizedPropertyException: Unrecognized field "woof" (class
        // DeductionBasedPolymorphismFieldOrderTest$Cat)
        assertTrue(animal.getClass().getName().contains("CatDog"));
    }

    @Test
    public void test3() throws JsonMappingException, JsonProcessingException {
        String json = "{\"kingdom\":\"animal\",\"arf\": \"arf.wav\",\"meow\": \"meow.wav\",\"woof\": \"woof.wav\",\"bark\": \"bark.wav\"}";
        Animal animal = new JsonMapper().readValue(json, Animal.class);
        // Throws: UnrecognizedPropertyException: Unrecognized field "meow" (class
        // DeductionBasedPolymorphismFieldOrderTest$Dog2)
        assertTrue(animal.getClass().getName().contains("CatDog"));
    }

    @JsonTypeInfo(use = Id.DEDUCTION, defaultImpl = CatDog.class)
    @JsonSubTypes({ @JsonSubTypes.Type(value = Cat.class), @JsonSubTypes.Type(value = Dog1.class),
            @JsonSubTypes.Type(value = Dog2.class) })
    public static class Animal {
        public String kingdom;
    }

    public static class Cat extends Animal {
        public String meow;
    }

    public static class Dog1 extends Animal {
        public String woof;
        public String bark;
    }

    public static class Dog2 extends Animal {
        public String woof;
        public String arf;
    }

    public static class CatDog extends Animal {
        public String meow;
        public String woof;
        public String bark;
        public String arf;
    }
}

Expected behavior This situation should always trigger the use of defaultImpl as it does in test 1, or, failing that, throw an InvalidTypeIdException referencing "Cannot deduce unique subtype" (but the behavior should be the same across examples).

Additional context The order of fields should not be what decides subtyping, as it is unintuitive and cannot always be controlled for.

cowtowncoder commented 3 years ago

/cc @drekbour

drekbour commented 3 years ago

The order of fields should not be what decides subtyping

I agree it feels oddly inconsistent ... Of note, the above is deliberately ambiguous input and what we're discussing is the fallback behaviour not healthy operation which would give the correct result from any field-order.

Deduction process is both simple and very optimistic/trusting. It assumes positive matches are intentional and the deduction completes the moment it has thinned candidates to a single option. It doesn't, as expected by test 2 & 3, continue to pre-process the entire message to see if subsequent fields may invalidate that candidate. It's optimistic that "meow" means you have a Cat and not a chimera because that's what you told it. This algorithm is not going to hold up well if input isn't trusted to match the model.

To invoke defaultImpl for tests 2 & 3 we must pre-process the whole message every time to prove a single candidate. This penalises normal behaviour where, in well-formed ecosystems, messages are all uniquely deducible. It also doesn't cover the myriad of other unmodelled cases where deduction was incorrect.

So, I see two valid but conflicting points here.

thanaParis commented 3 years ago

I understand that the intention is for Id.Deduction to be used exclusively for subtypes that contain unique fields, and that is being followed here with Cat, Dog1, and Dog2.

However, the defaultImpl option was added. Wanting defaultImpl be a superset rather than a subset of the unique subtypes is not, in my opinion, being deliberately ambiguous. I think it is a reasonable assumption of how even simple deduction would work.

To elaborate, in test2 and test3, it makes sense to me that Jackson presumed a final typing of Cat and Dog2 respectively once it hit their unique fields. It does not make sense to me that the top level exception upon hitting an unrecognized property while trying to deserialize Cat and Dog2 was UnrecognizedPropertyException instead of InvalidTypeIdException. I would expect that, if it's possible for the message to have been correctly sent to defaultImpl as in test1, it would do so after hitting unrecognized properties.

If this is a unsolvable limitation of the way Jackson currently works--i.e., once final typing is determined, there is no way to loop back and try again after deserialization fails--a flag to preprocess the entire message would be greatly appreciated.

I think part of the issue here is the language of JsonTypeInfo applying awkwardly to Deduction, which might imply it should be its own annotation. To me, "InvalidTypeIdException" is not descriptive of "Object is a subset of possible subtypes," not even with the explanatory "Cannot deduce unique subtype."

If the intention is to interpret unique fields as class identifiers, allowing them to be set in JsonSubTypes.Type(name) would work as well. As in: JsonSubTypes.Type(class = MyClass.class, name= "fieldName"), where MyClass.fieldName is defined and can be accessed through reflection. This would limit the max number of field checks during deduction to the number of declared subtypes, making it simple to deduce if a subtype object is superset of the declared subtypes. It would also make explicit what frameworks do not work with Id.Deduction--if no such field exists to be named, simple Deduction isn't the correct path.

cowtowncoder commented 3 years ago

Not sure if this helps, but there is somewhat strict separation, in Jackson, between 2 steps:

  1. Identifying concrete subtype, if (and only if) polymorphic deserialization is in effect (based on type)
  2. Deserialization into specific target type

Deduction operates strictly on (1) and may need buffering, but has no knowledge of (no access) to further information on what actual types would look like (beyond whatever it has gathered during construction). Use of defaultImpl, if needed, occurs as part of (1) Detection of unknown properties, however, occurs at (2); further, this actual deserializer has no knowledge of the process that may have been done for (1).

On note of Deduction-vs-JsonTypeInfo awkwardness: yes, very true; the original design is/was limiting and did not envision anything like field-based deduction.

Now: as to alternate that we could call "Explicit Deduction"... that could be another addition, and if so, would probably need a new type, leaving "simple" deduction mostly as-is. My main concern here would probably just be that of ever-increasing requirements as it can be difficult to express various combinations in declarative way.

Or, the way I originally thought this could be tackled, would have been to somehow extend interfaces to actually just pass JsonNode of everything, let handler figure it out.

drekbour commented 3 years ago

Focusing back on behaviour of defaultImpl in DEDUCTION mode: I wonder if fingerprinting defaultImpl might be factored into a requirement somehow. Currently it is not inspected in any way.

Failing that we have to decide if deduction is fast-and-optimistic or slow-and-complete.

cowtowncoder commented 3 years ago

Would it be possible to simply apply defaultImpl if deduction could not be performed either for no match or for multiple matches? Or alternatively only for no-match case but not ambiguous ones. I am not sure if there could be differing expectations for the two cases; if not, seems more straightforward (unless detection of problems is in place where defaultImpl not available?). But adding configurability for this would get bit tricky as it seems like a very specific setting (which is bad for XxxFeature settings).

mjustin commented 1 year ago

I ran across this today and it also ran contrary to my expectations on how this feature should work.

In my case, I have a REST API with a polymorphic type. In addition to using Jackson to deserialize requests, I also use it to confirm that the requests to the application are valid. I was playing around and passed in a JSON value that overlapped my two types, expecting to get back a deserialization error. Instead, it behaved as described in this ticket. Further, the fact that logically identical syntactically valid JSON was treated differently based on key ordering was also an unexpected surprise.

My expectation going into it was that the mapping would fail when there were multiple possible matching subtypes.

lbourdages commented 1 year ago

I have the same exact problem as mjustin. If deserialize a string with two properties from two different implementations, it will deserialize as one of them.

I would expect it to fail, because two different types match. The current behaviour makes it that I can't validate whether a payload is valid solely based on the deserialization being successful - and the worst part is that once it is deserialized I don't have access to the original properties anymore and thus can't check it manually.

anmoljande commented 4 months ago

@lbourdages , were you able to find any workaround for the issue where string containing two properties from two different implementations deserialize to the one whose property comes first in json string.

lbourdages commented 4 months ago

@lbourdages , were you able to find any workaround for the issue where string containing two properties from two different implementations deserialize to the one whose property comes first in json string.

I resorted to writing my own deserializer (i.e. implementing com.fasterxml.jackson.databind.JsonDeserializer) along with an @JsonDeserialize annotation on the class. It's definitely not the solution I would have preferred, but it gives full control over deserialization.

In the deserialize method, I can check that there aren't more than 1 subtype properties by using something like this:

    private void throwIfMultipleSubtypePropertiesArePresent(JsonParser jsonParser, ObjectNode objectNode)
            throws JsonMappingException
    {
        if (Stream.of(objectNode.has(PROPERTY_A),
                      objectNode.has(PROPERTY_B),
                      objectNode.has(PROPERTY_C))
                  .filter(Boolean::booleanValue)
                  .count() > 1) {
            throw new JsonMappingException(jsonParser,
                                           "Problem deserializing object: unable to map to a single implementation as multiple conflicting properties were found.");
        }
    }
drekbour commented 4 months ago

I wouldn't have introduced the overhead of Stream :) but yes DEDUCTION is not for situations where we cannot trust the input to cleanly match one or none of the expected types. I do appreciate that "input we cannot control the format of" is likely one of the most common reasons for considering DEDUCTION and this issue remains open.