FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
316 stars 136 forks source link

ION deserialization type change from Double to Float in 2.17.0 #490

Closed fhussonnois closed 6 months ago

fhussonnois commented 6 months ago

Hello, on my current projet we have tried to upgrade our dependencies from Jackson 2.16.2 to 2.17.0. But we're noticing a difference in the way the data is deserialized with ION.

Previsouly, the deserialization of either a float (e.g., 42F) or a double (e.g., 42D) was giving a double for both. This behavior changed and we now get float for both.

Here is a simple test for reproducing:

    @Test
    void test() throws IOException {
        // Configure IonSystem
        final var catalog = new SimpleCatalog();
        final var textWriterBuilder = IonTextWriterBuilder.standard()
            .withCatalog(catalog)
            .withCharsetAscii()
            .withWriteTopLevelValuesOnNewLines(true); // write line separators on new lines instead of spaces
        final var binaryWriterBuilder = IonBinaryWriterBuilder.standard()
            .withCatalog(catalog)
            .withInitialSymbolTable(_Private_Utils.systemSymtab(1));
        final var readerBuilder = IonReaderBuilder.standard()
            .withCatalog(catalog);

        IonSystem ionSystem = newLiteSystem(textWriterBuilder, (_Private_IonBinaryWriterBuilder) binaryWriterBuilder, readerBuilder);

        // Create ObjectMapper
        final ObjectMapper om = new IonObjectMapper(new IonFactory(ionSystem))
            .setSerializationInclusion(JsonInclude.Include.ALWAYS)
            .registerModule(new IonModule());

        // Jackson: 2.16.2 {double:42e0,float:42e0,long:9223372036854775807,int:2147483647}
        // Jackson: 2.17.0 {double:42e0,float:42e0,long:9223372036854775807,int:2147483647}
        byte[] serialized = om.writeValueAsBytes(Map.of(
            "float", 42F,
            "double", 42D,
            "int", Integer.MAX_VALUE,
            "long", Long.MAX_VALUE)
        );

        Map deserialized = om.readValue(serialized, Map.class);
        Assertions.assertEquals(42D, deserialized.get("float")); // FAILED with Jackson 2.17.0
        Assertions.assertEquals(42D, deserialized.get("double"));  // FAILED with Jackson 2.17.0
        Assertions.assertEquals(Integer.MAX_VALUE, deserialized.get("int"));
        Assertions.assertEquals(Long.MAX_VALUE, deserialized.get("long"));
    }

Deserialization results:

Jackson 2.16.2 jackson-2 16 2-ion

Jackson 2.17.0 jackson-2 17 0-ion

Do you know where this change might come from?

Thank you for your help.

cowtowncoder commented 6 months ago

Quick question: is this

        Assertions.assertEquals(42D, deserialized.get("float")); // FAILED with Jackson 2.17.0
        Assertions.assertEquals(42D, deserialized.get("double"));  // FAILED with Jackson 2.17.0

intentional? Shouldn't first one compare to 42F not 42D?

And yes, type of numbers should be retained: numbers written as floats should be decoded as floats (unless target type is double in which case coercion is to occur), and same for double.

As to the root cause, it looks like a regression: probably due to changes intended (ironically enough) retain type across those backends capable of distinguishing difference (textual formats like JSON cannot, binary formats like Ion typically do).

I'll see if I find time to create a unit test and see what is happening


EDIT: no, I get it now; test is as intended due to "Double-ness" of Ion API.

cowtowncoder commented 6 months ago

Ohhhh. Now I remember: one oddity of IonParser is that underlying data format reader (IonReader) does not appear to indicate 32-bit float type at all -- instead IonType.FLOAT refers to 64-bit double (double precision IEEE-754). As such, true type cannot be -- it seems -- preserved.

Similarly, IonWriter.writeFloat(double) does not have float-taking counterpart... so I guess Ion only does double.

At any rate, precision should never be lost so getting float back for something written as double seems wrong regardless.

cowtowncoder commented 6 months ago

Ok. Yeah, I was right to be suspicious -- looks like this from 2.17 branch IonParser:

    @Override // since 2.17
    public NumberTypeFP getNumberTypeFP() throws IOException
    {
        if (_currToken == JsonToken.VALUE_NUMBER_FLOAT) {
            final IonType type = _reader.getType();
            if (type == IonType.FLOAT) {
                // 06-Jan-2024, tatu: Existing code maps Ion `FLOAT` into Java
                //    `float`. But code in `IonReader` suggests `Double` might
                //    be more accurate mapping... odd.
                return NumberTypeFP.FLOAT32;
            }
            if (type == IonType.DECIMAL) {
                // 06-Jan-2024, tatu: Seems like `DECIMAL` is expected to map
                //    to `BigDecimal`, as per existing code so:
                return NumberTypeFP.BIG_DECIMAL;
            }
        }
        return NumberTypeFP.UNKNOWN;
    }

so I think that's what should be changed. But I'll need more time to investigate this...

cowtowncoder commented 6 months ago

Ugh. And documentation is both precise and slightly confusing:

https://amazon-ion.github.io/ion-docs/docs/float.html

claims that both 32- and 64-bit FP values are supported.

yet then goes to say...

" In the data model, all floating point values are treated as though they are binary64 "

Hmmmh.

Ok.

cowtowncoder commented 6 months ago

Fixed in 2.17 branch for upcoming 2.17.1 release

fhussonnois commented 6 months ago

Thank you so much for your help!

cowtowncoder commented 6 months ago

FWTW, Jackson 2.17.1 was just released, with ~20 fixes this included:

https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.17.1