FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
310 stars 133 forks source link

Incorrect serialization For LogicalType.Decimal #308

Open sheinbergon opened 2 years ago

sheinbergon commented 2 years ago

Your following extension NonBSGenericDatumWriter of GenericDatumWriter ignores proper support for AVRO's LogicalType.Decimal (not calling Conversions.DecimalConversion) when dealing with BigDecimal types.

A possible fix (inside NonBSGenericDatumWriter) might be:

        case BYTES:
            if (datum instanceof byte[]) {
                super.writeWithoutConversion(schema, ByteBuffer.wrap((byte[]) datum), out);
                return;
            }
            // Try and convert Decimal types, if the logical type indicates it
            LogicalType logicalType = schema.getLogicalType(); 
            if (logicalType instanceof Decimal.LogicalType  ) {
                Conversion<BigDecimal> conversion = data.getConversionByClass(BigDecimal.class, logicalType);
                byte[] decimal = convert(schema, logicalType, conversion, datum);
                super.writeWithoutConversion(schema, ByteBuffer.wrap(decimal), out);
                return;
            }
            break;

I'm willing to contribute a PR (including tests), but I'd like to apply it as a fix to 2.13.x branch.

@cowtowncoder Let me know what you think

sheinbergon commented 2 years ago

I believe #221 refers to the the same issue

cowtowncoder commented 2 years ago

Sounds good -- fix against 2.13 would be appreciated.

sheinbergon commented 2 years ago

@cowtowncoder While the write/(serialize) path was rather easy to fix, the read path seems a bit harder. serialization happens in a mostly Jackson generic fashion, unaware that the JsonParser implementation providing the tokens is an Avro related one.

I can register cutom JsonDeserializer for BigDecimal, just like you did in for date/time types in #283 (maybe we can bind this to a Mapper Feature), But I require schema information in order to set the scale properly.

In order to adhere to Avro Decimal serialization behavior described in Conversions.DecimalConversion - the scale is not serialized, and it is expected to be available from the schema when trying to deserialize (the number is is saved as an unscaled BigInteger)

Any ideas on how to get the correct location of the schema for that a deserializer?

10x

cowtowncoder commented 2 years ago

I would have thought that reading/deserialization should also work fine, as long as low-level decoder uses the schema information appropriately? Although I guess if this is not the case, then various workarounds may be needed.

sheinbergon commented 2 years ago

@cowtowncoder Well, it's not. Any ideas on how to get the schema information there?

It seems like the correct approach would to make JacksonAvroParserImpl and its counterpart avroContext (RecordReader$Std) be able to return the correct schema reference/location. Any tips ideas on how to best perform this?

cowtowncoder commented 2 years ago

Schema is carried along when constructing readers/writers, so that'd be the place to store information. It has unfortunately been a while since I looked into this so I do not remember exact flow; but basically only information deemed necessary from schema is retained for readers/writers, not the schema. To ideally avoid various by-name lookups.

cowtowncoder commented 2 years ago

One possible way forward here would be a (unit) test that shows incorrect handling; possibly contrasting Avro-backed decoder (which would expect properly encoded/serialized value) to Jackson one (which works with whatever serialization is currently used) to show the issue?

I could probably find time to work on this in near future, but would need help reproducing the issue first (as well as then verifying the fix).