Open SheliakLyr opened 7 years ago
@SheliakLyr First of all, thank you for reporting this problem. It sounds like a very important thing to resolve.
I will have to double-check this to make absolutely sure; both with respect to discrepancy between implementation and spec, and to verify whether there is difference between different parts of implementation (BigDecimal
, BigInteger
and byte[]
).
My first thought is that the explanation in spec is ambiguous with respect to alignment across bytes (in which byte is padding) vs alignment within byte (which bits contain padding), and that implementation works as intended, but explanation is not good. If so, it should clearly state that within padded byte, most-significant bit is not used, but that padding is in Least-Significant BIts.
I hope this is the case (and as I said, I need to read it carefully). If there is real disagreement we have a more significant issue and if so must consider backwards compatibility.
@SheliakLyr your finding is correct and the specification incorrectly claims padding is in Least-Significant Bits, but the implementation pads MSB instead. I will need to update specification to correct this mistake.
Ok I updated:
https://github.com/FasterXML/smile-format-specification/blob/master/smile-specification.md
with what should be more correct description. I am not entirely happy with the description so any PRs there would be appreciated.
Still, the behavior of implementation, as documented in spec v1.0.5 is the correct one due to practical reasons (obviously having had incorrect specification for years is not ideal).
SMILE specification contains the following statements:
Let's consider a simple array: Array(0x01). After encoding it to 7LSB it will use 2 bytes. According to the specification "any padding of the last byte is in its rightmost, least-significant, bits", so it should look like this:
However, the jackson library behaves differently. The following code (Scala) encodes a BigInteger (1).
Output:
After removing header/type token/content length we are left with
0001
! I've discovered that padding is located in the last byte but in its LEFT-MOST, most-significant bits. So an array(0x01) is encoded to:_0000000 pppppp_1
Am I right? Is the specification wrong or is it a bug in implementation?