go-restruct / restruct

Rich binary (de)serialization library for Golang
https://restruct.io/
ISC License
360 stars 17 forks source link

Issue with List Unpacking with values over byte boundary #33

Closed umeat closed 5 years ago

umeat commented 5 years ago

I'm seeing some strange behavior when I have lists which don't fit nicely on byte boundaries. I wrote the following as a test while I was trying to figure out what the issue was.

package restruct

import (
    "fmt"
    "testing"
    "encoding/binary"
)

type Type struct {
    Count uint8 `struct:"uint8,sizeof=List"`
    List []struct {
        A bool `struct:"uint8:1,variantbool"`
        B uint8 `struct:"uint8:4"`
        C uint8 `struct:"uint8:4"`
    }
}

func TestList(t *testing.T) {
    out := Type{}
    data := []byte{0x08, 0x0F, 0x0F, 0x0F, 0x0F, 0x0F, 0x0F, 0x0F, 0x0F, 0x0F}

    fmt.Println(Unpack(data, binary.BigEndian, &out)) // check for error
    fmt.Printf("%+v\n\n", out)

    fmt.Println(data) // input data 
    fmt.Println(Pack(binary.BigEndian, &out)) // packed struct
}

This gives the following output:

<nil>
{Count:8 List:[{A:false B:1 C:7} {A:false B:3 C:3} {A:false B:7 C:1} {A:false B:15 C:0} {A:true B:7 C:1} {A:true B:3 C:3} {A:true B:1 C:7} {A:true B:0 C:15}]}

[8 15 15 15 15 15 15 15 15 15]
[8 11 140 206 47 11 140 206 47 15 0 0 0 0 0 0 0] <nil>

I converted the input byte array to binary so I could compare it to the generated struct (Count element is first byte, each following 9 bits is a list element):

00001000
0 0001 1110
0 0011 1100
0 0111 1000
0 1111 0000
1 1110 0001
1 1100 0011
1 1000 0111
1 0000 1111

If you compare that to the generated struct, it's correct - however Packing the struct again gives the unusual output of:

[8 11 140 206 47 11 140 206 47 15 0 0 0 0 0 0 0]

When we expect:

[8 15 15 15 15 15 15 15 15 15]

The first issue appears in the first list element, however there is an additional issue of the seven trailing 0 bytes, which is presumably related.

jchv commented 5 years ago

Ah, yeah, the bit packing stuff doesn't really work across struct boundaries atm :( This is really a result of the fact that Restruct works on the byte level and bit level access is implemented as a layer on top of that. SizeOf returns a length in bytes, particularly.

I think this can be fixed, but it will require a new API or two. I'll try to check this out ASAP.

jchv commented 5 years ago

Something is wrong with the decoding too.

0 0001 1110

This would be:

A: false
B: 1
C: 14

For some reason, C is shifted wrong, so both the encoding and decoding break.

I've been working on this bug and I've come to determine that the current bit-level code is too complicated. Not only that, but it's also probably not that optimal. So, for bit level cases, I'm going to drop down to reading and writing bit-by-bit, for the moment. Though optimizing this would be very useful, it needs to be done carefully, in a way that is easier to reason about. In order to offset some of the cost, I'm going to add a fast-path for byte aligned cases, which should make those cases faster than they are today (since they are not currently special-cased.)

jchv commented 5 years ago

Alright, my tests are all passing, including two new tests for these new bit level cases, so I've just landed my fix in master. If you experience performance regressions for byte-misaligned fields that are unacceptable, let me know and I'll try to come up with a better optimization for the bit level reading.