Open sfackler opened 3 years ago
@sfackler I agree that it would make sense to both define what is expected (required), and then, if necessary, what remains implementation dependent. It sounds handling by Java implementation is inconsistent across float and double; this is unfortunate.
In this particular case it would seem to me that the unused bits must be ignored by decoder (given that there is discrepancy), but that encoders should probably be recommended to use one approach for both cases; probably that of leaving them as 0
s.
Java codec would then be deviating from this as of Jackson 2.13.0.
Does above make sense?
I'll file an issue at:
https://github.com/FasterXML/smile-format-specification/issues
linking to this one, so that ideally specification would clarify this behavior.
Yep, I think that makes sense.
I finally went ahead and updated Smile spec, as per:
https://github.com/FasterXML/smile-format-specification/issues/17
Please let me know if this helps. I hope to tackle the encoding itself in (near-ish?) future, probably for 2.14.0 since change may be slight compatibility concern: it is possible some decoders could rely on sign extension.
Not sure how to alleviate that concern: maybe add a SmileParser.Feature
to allow old handling.
Thanks!
I'm working on a Smile implementation in Rust, and am trying to exactly match Jackson's behavior to be able to test against it. While testing floats and doubles, I've noticed an inconsistency in how Jackson's serializer behaves. Specifically, the float and double serialization logic differ in what ends up in the unused high bits of the MSB of the encoded representation.
The float implementation uses arithmetic shifts, so a negative floating point value will end up with the unused high bits set to 1 and a positive floating point value will end up with the unused high bits set to 0. In contrast, the double implementation uses a logical shift in one place, so the unused high bits are always 0.
As an example, here's an encoding of
(float) -0
into Smile using 2.13.0:The first byte of the encoded float is 0x78, aka
0b01111000
.And here's the encoding of
(double) -0
:The first byte of the encoded double is 0x01, aka
0b00000001
.The contents of the unused bits shouldn't matter in practice, but it'd probably be good to unify and explicitly specify the desired behavior here.