NordicSemiconductor / zcbor

Low footprint C/C++ CBOR library and Python tool providing code generation from CDDL descriptions.
Apache License 2.0
105 stars 34 forks source link

Question about union types #371

Closed mkschreder closed 4 months ago

mkschreder commented 7 months ago

When looking at the generated decoder for union type it looks like this:

|| (zcbor_union_elem_code(state) && (((decode_MyType(state, (&(*result)._MyType)))) && (((*result).union_choice = _union__MyType), true)))

The problem is that there is no field in the data that actually tells it what union type it is in the message. This means that as long as one type can be decoded in full then it will be accepted. I wonder if this is intentional or simply a bug. I would expect that if I set union_choice in the message I pass to the encoder, it will be the same value that will be decoded on the decoder side even if another structure can be successfully decoded before reaching the check for the original type.

I also think that current decoding of unions is a bit strange. It means that you do a lot of duplicate work trying to decode messages that are not really decodable because they are wrong type and only giving up when the decoding fails.

Is there a way to sidestep this behavior?

PS. what is the correct meaning of mytype = Type1 / Type2? Is it "try to parse Type1 and then if that fails try to parse Type2" or should it be "Try to parse a union type that can only be Type1 or Type2 and match the key in the data".

oyvindronningstad commented 7 months ago

Unions are part of the CDDL spec, but not part of CBOR. This means that the union doesn't appear in the data as anything more than one of its members.

PS. what is the correct meaning of mytype = Type1 / Type2? Is it "try to parse Type1 and then if that fails try to parse Type2"

Your first guess here is the correct one. This is for better or worse, and is impacted by your design of the data description (CDDL). If your union contains members that are ambiguous with each other from the start of the data, the parser will spend more time attempting to parse the wrong thing before failing and trying the next.

If you are concerned about this kind of performance in union parsing, you can essentially do 2 things when designing your CDDL description:

  1. Put something simple and unique (like a uint with a definite value) at the start of each member. In fact, if you put a unique int at the start of each member, zcbor will make a small optimization by decoding that int directly into the _choice enum value for the union.
  2. Put the most common member (the one that will most often be encountered) as the first member in the CDDL.
mkschreder commented 7 months ago

I actually realized that "groups of maps" does in fact seem work:

MyType = {
  (1: Type1) /
  (2: Type2)
}

It translates to a union of structs and the encoder does place a union_choice that the decoder can then use to decode the type.