FasterXML / jackson-dataformats-binary

Uber-project for standard Jackson binary format backends: avro, cbor, ion, protobuf, smile
Apache License 2.0
315 stars 136 forks source link

Floats are encodedwith sign extension while doubles are encoded without #300

Open sfackler opened 3 years ago

sfackler commented 3 years ago

I'm working on a Smile implementation in Rust, and am trying to exactly match Jackson's behavior to be able to test against it. While testing floats and doubles, I've noticed an inconsistency in how Jackson's serializer behaves. Specifically, the float and double serialization logic differ in what ends up in the unused high bits of the MSB of the encoded representation.

The float implementation uses arithmetic shifts, so a negative floating point value will end up with the unused high bits set to 1 and a positive floating point value will end up with the unused high bits set to 0. In contrast, the double implementation uses a logical shift in one place, so the unused high bits are always 0.

As an example, here's an encoding of (float) -0 into Smile using 2.13.0:

00000000: 3a29 0a00 2878 0000 0000                 :)..(x....

The first byte of the encoded float is 0x78, aka 0b01111000.

And here's the encoding of (double) -0:

00000000: 3a29 0a00 2901 0000 0000 0000 0000 00    :)..)..........

The first byte of the encoded double is 0x01, aka 0b00000001.

The contents of the unused bits shouldn't matter in practice, but it'd probably be good to unify and explicitly specify the desired behavior here.

cowtowncoder commented 3 years ago

@sfackler I agree that it would make sense to both define what is expected (required), and then, if necessary, what remains implementation dependent. It sounds handling by Java implementation is inconsistent across float and double; this is unfortunate.

In this particular case it would seem to me that the unused bits must be ignored by decoder (given that there is discrepancy), but that encoders should probably be recommended to use one approach for both cases; probably that of leaving them as 0s. Java codec would then be deviating from this as of Jackson 2.13.0.

Does above make sense?

I'll file an issue at:

https://github.com/FasterXML/smile-format-specification/issues

linking to this one, so that ideally specification would clarify this behavior.

sfackler commented 3 years ago

Yep, I think that makes sense.

cowtowncoder commented 2 years ago

I finally went ahead and updated Smile spec, as per:

https://github.com/FasterXML/smile-format-specification/issues/17

Please let me know if this helps. I hope to tackle the encoding itself in (near-ish?) future, probably for 2.14.0 since change may be slight compatibility concern: it is possible some decoders could rely on sign extension. Not sure how to alleviate that concern: maybe add a SmileParser.Feature to allow old handling.

sfackler commented 2 years ago

Thanks!