FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Fix getNullValue of IonValueDeserializer. #319

Closed atokuzAmzn closed 2 years ago

atokuzAmzn commented 2 years ago

This PR aims to resolve the issues https://github.com/FasterXML/jackson-dataformats-binary/issues/317 and https://github.com/FasterXML/jackson-dataformats-binary/issues/318

Changes:

  1. Do not call getEmbeddedObject when current token is END_OBJECT
  2. Make sure java nulls are deserialized correctly
  3. Update unit test case with nulls
atokuzAmzn commented 2 years ago

@cowtowncoder Can you add @popematt as reviewer?

atokuzAmzn commented 2 years ago

This is technically a backwards-incompatible change that you're introducing, which means that it can't be merged into in any 2.x branch.

Actually what I am proposing is to fix backward compatibility. After https://github.com/FasterXML/jackson-dataformats-binary/pull/296, 2.13 already became incompatible.

Just take this simple test case below. This test succeeds in 2.12 but right now it fails in 2.13. If we had a test case like this in 2.12 this would have been caught.

    @Test
    public void shouldBeAbleToDeserializeJavaNullValue() throws Exception {
        IonValueData source = new IonValueData();
        source.put("c", null);

        IonValue data = ION_VALUE_MAPPER.writeValueAsIonValue(source);
        IonValueData result = ION_VALUE_MAPPER.readValue(data, IonValueData.class);

        assertEquals(source, result);
    }

The problem defined in issue https://github.com/FasterXML/jackson-dataformats-binary/issues/295 was about null.structs not being deserialized correctly. Definitely this was a miss and should have been fixed but while fixing the issue we shouldn't have broken the java null deserialization. With this proposal both null.structs (etc) and java nulls will be deserialized correctly.

popematt commented 2 years ago

@atokuzAmzn Thank you for explaining that; this makes sense to me now.

@cowtowncoder, should this be merged into the 2.13 branch as well as 2.14?

atokuzAmzn commented 2 years ago

@popematt I have added an extra test case

cowtowncoder commented 2 years ago

Hmmmh. Ok, so the previous change went into 2.13.0, sort of establishing 2.13 behavior. I would lean towards 2.14(.0) inclusion since we already have 2 patches for 2.13.

But if everyone else feels strongly I would not block inclusion in 2.13 instead.

atokuzAmzn commented 2 years ago

I believe we need to fix the current behaviour in 2.13 since it digressed from 2.12.

cowtowncoder commented 2 years ago

As I said, if others (@popematt ) agree, I am ok changing it to 2.13. I am just saying that for users who might have started with 2.13.0, change in behavior in a patch might be surprising. But I do not have full context or understanding to know if it might be that 2.13.x behavior so far is so broken there's no legitimate use case -- if so, yeah, patch it up.

Just let me know either way.

Also noted: besides myself, @mcliedtke has commit access. And I can give more access to Amazon Ion team members if anyone is interested. While I like being kept in the loop I also want to delegate access rights where it can help speed up things.

tgregg commented 2 years ago

@atokuzAmzn The Ion specification allows null to have annotations, just like any other value. E.g. 'com.foo.Bar'::null. Under this proposal, do we lose all annotations on untyped nulls? (Note: this should be added to the unit tests.) If so, there needs to be an alternate behavior capable of preserving the annotations, which are part of the Ion data model. Should we return IonNull when reading annotated untyped nulls and Java null when reading un-annotated untyped nulls? Or should there be configuration to allow users to receive Java null for all Ion nulls, allowing them to opt-in to the possibility of data loss?

atokuzAmzn commented 2 years ago

@tgregg I have added another check to implement your first option and also added a new unit test case while expanding the existing null deserialization test case with annotations.

mcliedtke commented 2 years ago

To me, the 2.13 behavior seems like the right handling for Ion nulls. An IonValue represents the actual Ion domain data object and in Ion, the null's are types and have an IonValue representation. An untyped Ion null should result in non-(java)-null IonValue which is of type IonNull. On top of that, the divergence of handling of typed nulls, untyped null, and annotated nulls in the latest revision seem really inconsistent / confusing. Why should the unannotated, untyped null be "special" and result in a Java null?

I realize this is different from 2.12, but the 2.13 just seems more correct. I'd potentially be open to a feature flag that allows a user to specify the legacy behavior if there was a simple way.

atokuzAmzn commented 2 years ago

This is moving into a deadlock:)

And I think it was my mistake trying to solve two issues (https://github.com/FasterXML/jackson-dataformats-binary/issues/317 and https://github.com/FasterXML/jackson-dataformats-binary/issues/318) with one PR. Because as far as I can understand most of the discussion revolves around https://github.com/FasterXML/jackson-dataformats-binary/issues/318, but I could actually create a separate PR for https://github.com/FasterXML/jackson-dataformats-binary/issues/317 to solve that separately.

So back to https://github.com/FasterXML/jackson-dataformats-binary/issues/318. I accept that the last commit made it confusing. But as far as I understand there can never be a full solution to this problem. So we have to make a trade off.

Either, we will deserialize java nulls to IonNull or, we will deserialize IonNulls(not null.struct etc they are NOT IonNulls) to java null.

Whatever we choose, we will still make a wrong deserialization. There is no escape from this. So we just need to decide which wrong we want to do by default.

Even if we make this configurable we will still need a default value. So the question is, what should be the default behaviour?

And I believe the answer to this question lies in which use case occurs more often than the other. I am not an expert at all in this but I believe serializing java nulls probably occurs much more often than serializing IonNulls. And just to add as confirmation, in the internal implementation which we were using for a long time, the choice was to return java nulls for both of them.

mcliedtke commented 2 years ago

In my last comment I glossed over https://github.com/FasterXML/jackson-dataformats-binary/issues/317, I agree that https://github.com/FasterXML/jackson-dataformats-binary/issues/317 should be fixed.

For https://github.com/FasterXML/jackson-dataformats-binary/issues/318, I think the current 2.13 behavior is the most correct when it comes to the intent of what an IonValue is.

we will deserialize java nulls to IonNull

I agree there is ambiguity with respect to the following case:

class MyBean {
    @JsonProperty("value")
    public IonValue value;
}

MyBean bean = new MyBean();
bean.value = null;

MyBean bean2 = new MyBean();
bean.value = ionSystem.newNull();

With a default IonObjectMapper with the IonValueModule installed, both serialize to the same Ion data representation - { value: null }). But in reading the data back, the more correct option should be to deserialize as an IonNull.

If a user wanted to remove the ambiguity of Java vs. Ion nulls, I would likely direct them to serialize Java-null as an absent value:

class MyBean {
    @JsonProperty("value")
    @JsonInclude(Include.NON_NULL)
    public IonValue value;
}

MyBean bean = new MyBean();
bean.value = null; //Now serialized as {}

MyBean bean2 = new MyBean();
bean.value = ionSystem.newNull(); //Still serialized as {value:null}

This would remove all ambiguity around what the desired effect was. The serialized data can be deserialized without data loss.

I believe serializing java nulls probably occurs much more often than serializing IonNulls.

Ideally I wouldn't want frequency to determine the behavior, rather what would be most intuitive or correct for the data specification. Given the intent of IonValue is to represent the full range of the Ion domain types, I don't see why we would special case null's.

atokuzAmzn commented 2 years ago

Well, I don't have anything else to add on this issue, if this is the decision, then it is.

I have published a separate PR https://github.com/FasterXML/jackson-dataformats-binary/pull/320 for https://github.com/FasterXML/jackson-dataformats-binary/issues/317, which is much more straightforward. Please review when you have time.

cowtowncoder commented 2 years ago

And just to make it clear: I am happy to merge whatever is agreed upon by maintainers, but it may be simplest for @mcliedtke to merge. The only (?) caveat is that whoever merges PRs also needs to ensure they get forward-merged to later branches as appropriate (2.13 -> 2.14 -> master).

cowtowncoder commented 2 years ago

Now that #317 is resolved, what would be the way forward here, if any? No particular hurry but wanted to add an update if there might be progress. At minimum if PR is to stay open would probably need to merge/rebase from 2.14 as there's currently conflict (for overlapping work I presume).

tgregg commented 2 years ago

I agree with @mcliedtke 's assessment and suggested workaround (using JsonInclude(Include.NON_NULL)) when Java null is required instead of IonNull. Accordingly, it looks like this can be closed without changes.

atokuzAmzn commented 2 years ago

In the end, we needed to make a choice so I am also OK to close this issue.