fxamacker / cbor

CBOR codec (RFC 8949) with CBOR tags, Go struct tags (toarray, keyasint, omitempty), float64/32/16, big.Int, and fuzz tested billions of execs.
MIT License
748 stars 61 forks source link

bug: Marshalling skips public (but implicit) field in struct #454

Closed mitjat closed 11 months ago

mitjat commented 11 months ago

What version of fxamacker/cbor are you using?

2.5.0

Does this issue reproduce with the latest release?

yes

What did you expect to see?

The following struct has a public field Int:

type myBigInt struct {
    *big.Int
}

Though it's named implicitly, it can be used to construct values such as myBigInt{Int: ...}. So I expected myBigInt to be serialized as a map with one key (Int), in accordance with the generic struct serialization rules.

What did you see instead?

All values of type myBigInt get marshaled into 0xa0 (empty map).

Steps to reproduce

Here is a minimal playground example: https://go.dev/play/p/SMkjfzxdBHr

Note that if you tweak the example so the field is named explicitly:

type myBigInt struct {
    Int *big.Int
}

then serialization works as expected.

fxamacker commented 11 months ago

@mitjat Thanks for reporting this! I'll look into it.

fxamacker commented 11 months ago

@mitjat Thanks again for reporting this!

This initially looked like a bug but after looking into this further, it doesn't appear to be a bug because fxamacker/cbor behaves as expected (documented) and follows the same encoding rules as encoding/json for "struct values" and "anonymous struct fields".

Given this and 4-5 workarounds being available (see below), I'll close this issue for now. Please feel free to reopen if I missed anything. :pray:

:warning: The example you provided allows encoding/json to work as-is because big.Int implements json.Marshaler interface. However, encoding/json won't encode any additional fields added to myBigInt definition, so there are pitfalls to avoid (regardless of codec) when using embedded struct fields.

Structs are serialized as CBOR maps, and exported struct fields are serialized as map elements. Fields in embedded struct are encoded as if they are in the outer struct.

For example,

type myBigInt1 struct {
    Int *big.Int
}

type myBigInt2 struct {
    *big.Int
}

Docs for `cbor.Marshal() says,

// Struct values encode as CBOR maps (type 5).  Each exported struct field
// becomes a pair with field name encoded as CBOR text string (type 3) and
// field value encoded based on its type.  
// ...
// Anonymous struct fields are marshaled as if their exported fields
// were fields in the outer struct.  Marshal follows the same struct fields
// visibility rules used by JSON encoding package.

For this use case, a simple workaround is to make big.Int a regular struct field as you pointed out.

Otherwise, there are 4 approaches to encode embedded big.Int:

  1. CBOR map with one field (provide a field name such as "Foo")

    type myBigInt struct {
    *big.Int `cbor:"Foo"`
    }
  2. CBOR array with one element

    type myBigInt struct {
    _        struct{} `cbor:",toarray"`
    *big.Int `cbor:"Foo"`
    }
  3. CBOR bignum (by implementing cbor.Marshaler)

    
    type myBigInt struct {
    *big.Int
    }

func (b *myBigInt) MarshalCBOR() ([]byte, error) { return cbor.Marshal(b.Int) }

func (b myBigInt) UnmarshalCBOR(data []byte) error { var bigInt big.Int err := cbor.Unmarshal(data, &bigInt) if err != nil { return err } b.Int = bigInt return nil }


4. CBOR bytes (by implementing `encoding.BinaryMarshaler`)
```Go
type myBigInt struct {
    *big.Int
}

func (b *myBigInt) MarshalBinary() ([]byte, error) {
    // TODO: serialize b.Int in custom format and resulting data is encoded as CBOR bytes automatically
}

func (b *myBigInt) UnmarshalBinary(data []byte) error {
    // TODO: deserialize data in custom format 
}

Given this works as expected and has 4-5+ workarounds, we don't need to special case big.Int for embedded struct.

I am going to close this issue as not a bug for now. We can always reopen this if needed.

Please feel free to comment in this issue.

mitjat commented 11 months ago

Thank you for such a detailed explanation! For me, the gotcha was that when I read the documentation, I noticed the first paragraph of the docstring you quoted above, but not the last one (which is the more relevant one):

// Struct values encode as CBOR maps (type 5).  Each exported struct field
// becomes a pair with field name encoded as CBOR text string (type 3) and
// field value encoded based on its type.  
// ... [snip 3 paragraphs] ...
// Anonymous struct fields are marshaled as if their exported fields
// were fields in the outer struct.  Marshal follows the same struct fields
// visibility rules used by JSON encoding package.

I agree there's nothing to fix; the behavior is non-obvious but matches the documentation (and encoding/json, which matters). Sorry for the false report, and thanks again for looking into it.