go-restruct / restruct

Rich binary (de)serialization library for Golang
https://restruct.io/
ISC License
360 stars 17 forks source link

Fix bitpacking edge case behaviors. #47

Closed jchv closed 3 years ago

jchv commented 3 years ago

The current bitpacking code does very silly things when dealing with little endian and signed numbers. This PR aims to fix both and add a few tests to help ensure ongoing correctness of behaviors.

This change is somewhat breaking, but the old behavior is so broken, that it is pretty doubtful anyone was actually relying on it. Apologies if this change breaks you. I did a cursory search of restruct usages on GitHub and did not find many users of bitpacking, especially with little endian.

Fixes #46.

Overview: Bitpacking and Endianness

Bitpacking and endianness is a little ugly. Firstly, there are two possible meanings of "endianness" in terms of bitpacking: you can either pack bits and then byteswap, or byteswap and then bitpack. Because restruct works on individual fields that are encoded to streams of bits individually, it can't really do the former easily. The latter arguably doesn't make sense, but nonetheless, it is the only sensical way to interpret endianness from restruct's PoV. So, if you specify little endian on a bitpacked integer, the bitpacked integer itself will be byteswapped, and then packed.

If you want to handle a more complex encoding, you will need to use a custom type.

Because the Go encoding/binary ByteOrder is strictly dealing with byte-wise constructs, in order to know where significant bits go we need to inspect whether or not we are using binary.LittleEndian. This slight hack is viewed as necessary as it is the only way for data to be preserved roundtrip; prior to now, the bit encoding function actually cropped off the wrong part of the buffer, leading to data loss.

Overview: Bitpacking and Signed Integers

Up until now, signed integers were simply not sign-extended when decoded with bitpacking. In essence, this should basically break signed integers any time they are bitpacked, because their sign bit will just be ignored.

In this PR, sign extension is added to all bitwise signed integer reads. Doing so requires some extra machinery in the decoding process, but thankfully not the encoding process; the encoding process just needs to truncate the extra bits, which is trivially done by the code that already exists.

jchv commented 3 years ago

Great! I'll go ahead and merge. Thanks for checking.

blakeklasing commented 2 years ago

This is a pretty huge fix! I would love to be able to use a valid release version to get this fix instead of calling out a specific commit hash. Any plans for this getting in a release sometime soon?