fxamacker / cbor

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
MIT License
719 stars 59 forks source link

Use cases of UnmarshalFirst? #483

Closed MastaP closed 8 months ago

MastaP commented 8 months ago

Hello,

I'm new to CBOR and trying to wrap my head around the functionality that UnmarshalFirst() provides. The documentation says:

// UnmarshalFirst decodes first CBOR data item and returns remaining bytes.

It seems that with this lib, the top-level data item is always a CBOR array or a map. Thus, UnmarshalFirst() always returns the whole item, that is, array or a map with all contents.

The only way I was able to utilize UnmarshalFirst() was to create a malformed CBOR by concatenating two (or more) encoded structures into a single byte array.

func TestCBOR_concat_UnmarshalFirst(t *testing.T) {
    type MySubStruct struct {
        B byte
    }
    data := []*MySubStruct{{B: 1}, {B: 2}}
    buf := &bytes.Buffer{}
    enc := cbor.NewEncoder(buf)
    for _, d := range data {
        require.NoError(t, enc.Encode(d))
    }
    bytes := buf.Bytes()
    //require.NoError(t, cbor.Wellformed(bytes)) // fails
    fmt.Printf("%X\n", bytes)
    result := make([]*MySubStruct, 0)
    for len(bytes) > 0 {
        df, r, err := cbor.DiagnoseFirst(bytes)
        require.NoError(t, err)
        fmt.Printf("first: %s\n", df)
        fmt.Printf("rest: %X\n", r)
        d := &MySubStruct{}
        bytes, err = cbor.UnmarshalFirst(bytes, d)
        require.NoError(t, err)
        result = append(result, d)
    }
    require.Equal(t, data, result)
}

Is it possible to construct wellformed CBOR encoding and be able to iterate multiple data items from the top level?

Thanks.

p.s. in my use-case I'd like to version my data structures and be able to use UnmarshalFirst() to first read the version and then decode the rest of the payload accordingly.

fxamacker commented 8 months ago

Hi @MastaP :wave:

This library supports both CBOR (RFC 8949) and CBOR Sequences (RFC 8742). A CBOR Sequence is simply a concatenation of zero or more CBOR data items.

Unmarshal() requires one CBOR data item and it must not have any trailing bytes. Otherwise, the data is rejected for being malformed.

UnmarshalFirst() allows trailing bytes, so it supports more use-cases than Unmarshal().

Use-cases for UnmarshalFirst() include:

UnmarshalFirst() checks the first CBOR data item for well-formedness and validity but allows trailing bytes. For performance, it does not try to decode or check any of the trailing bytes.

p.s. in my use-case I'd like to version my data structures and be able to use UnmarshalFirst() to first read the version and then decode the rest of the payload accordingly.

For your use-case, you can encode to a CBOR Sequence:

At a glance, your code snippet looks like it is using CBOR Sequence.

Both Unmarshal() and UnmarshalFirst() check for well-formedness and validity (with some differences) so you don't need to manually check before decoding.

It seems that with this lib, the top-level data item is always a CBOR array or a map.

Actually, this library supports all CBOR data items as top-level data item, such as CBOR integers, bool, array, map, etc.

For your use-case, these RFCs may be of interest:

MastaP commented 8 months ago

Thanks a lot for the comprehensive reply, @fxamacker .