Use cases of UnmarshalFirst?

MastaP commented 8 months ago

Hello,

I'm new to CBOR and trying to wrap my head around the functionality that UnmarshalFirst() provides. The documentation says:

// UnmarshalFirst decodes first CBOR data item and returns remaining bytes.

It seems that with this lib, the top-level data item is always a CBOR array or a map. Thus, UnmarshalFirst() always returns the whole item, that is, array or a map with all contents.

The only way I was able to utilize UnmarshalFirst() was to create a malformed CBOR by concatenating two (or more) encoded structures into a single byte array.

func TestCBOR_concat_UnmarshalFirst(t *testing.T) {
    type MySubStruct struct {
        B byte
    }
    data := []*MySubStruct{{B: 1}, {B: 2}}
    buf := &bytes.Buffer{}
    enc := cbor.NewEncoder(buf)
    for _, d := range data {
        require.NoError(t, enc.Encode(d))
    }
    bytes := buf.Bytes()
    //require.NoError(t, cbor.Wellformed(bytes)) // fails
    fmt.Printf("%X\n", bytes)
    result := make([]*MySubStruct, 0)
    for len(bytes) > 0 {
        df, r, err := cbor.DiagnoseFirst(bytes)
        require.NoError(t, err)
        fmt.Printf("first: %s\n", df)
        fmt.Printf("rest: %X\n", r)
        d := &MySubStruct{}
        bytes, err = cbor.UnmarshalFirst(bytes, d)
        require.NoError(t, err)
        result = append(result, d)
    }
    require.Equal(t, data, result)
}

Is it possible to construct wellformed CBOR encoding and be able to iterate multiple data items from the top level?

Thanks.

p.s. in my use-case I'd like to version my data structures and be able to use UnmarshalFirst() to first read the version and then decode the rest of the payload accordingly.

fxamacker commented 8 months ago

Hi @MastaP :wave:

This library supports both CBOR (RFC 8949) and CBOR Sequences (RFC 8742). A CBOR Sequence is simply a concatenation of zero or more CBOR data items.

Unmarshal() requires one CBOR data item and it must not have any trailing bytes. Otherwise, the data is rejected for being malformed.

UnmarshalFirst() allows trailing bytes, so it supports more use-cases than Unmarshal().

Use-cases for UnmarshalFirst() include:

decoding first CBOR data item (RFC 8949) in a CBOR Sequence (RFC 8742)
decoding first CBOR data item in a mixed encoding (e.g. CBOR data item as header, followed by non-CBOR payload)

UnmarshalFirst() checks the first CBOR data item for well-formedness and validity but allows trailing bytes. For performance, it does not try to decode or check any of the trailing bytes.

p.s. in my use-case I'd like to version my data structures and be able to use UnmarshalFirst() to first read the version and then decode the rest of the payload accordingly.

For your use-case, you can encode to a CBOR Sequence:

first CBOR data item for metadata such as version, count of top-level data items, etc.
remaining CBOR data item(s) for the data.

At a glance, your code snippet looks like it is using CBOR Sequence.

Both Unmarshal() and UnmarshalFirst() check for well-formedness and validity (with some differences) so you don't need to manually check before decoding.

It seems that with this lib, the top-level data item is always a CBOR array or a map.

Actually, this library supports all CBOR data items as top-level data item, such as CBOR integers, bool, array, map, etc.

For your use-case, these RFCs may be of interest:

RFC 8949, CBOR (defines CBOR data item)
RFC 8742, CBOR Sequences (concatenated CBOR data items)
RFC 7049, CBOR (obsoleted by RFC 8949, referenced by RFC 8742, RFC 8610, etc.)
Maybe also RFC 8610, CDDL (if you need to specify very complex CBOR-based data formats)

MastaP commented 8 months ago

Thanks a lot for the comprehensive reply, @fxamacker .

fxamacker / cbor

Use cases of UnmarshalFirst? #483