Discussion: Simpler Encoding of attributes

xipki commented 1 year ago

In the current draft version (v7), there are 2 different methods to encode the extensions field:

With only one non-critical extension keyUsage
Others

For the name, it is even more complicated than extensions. 4 encoding methods may apply depending on the content:

With only one attribute commonName, and the value consists only of lower-case hex numbers;
With only one attribute commonName, and the value is in the form "HH-HH-HH-HH-HH-HH-HH-HH"
With only one attribute commonName, but not in case 1 and 2.
All other cases.

For me, it is too complicated and not necessary, and is not in line with the idea of C509 (simple). Note it saves here only several bytes (e.g. 3 bytes in extensions), but makes the logic much more complicated. What is the background for this?

gselander commented 12 months ago

One idea behind this work is to be compliant with X.509 as used in practice, and at the same time optimized for X.509 as used in constrained environments, e.g. the TLS/DTLS profile for IoT (RFC 7925). In many constrained settings, the issuer/subject is just a common name, hence the reason for optimizing this case. For constrained devices this common name is typically a device identifier, an EUI-64/48 identifier or a byte string / IP address, for which a CBOR bstr encoding becomes much more compact. Similarly for these settings key usage may be the single extension used.

For non-constrained environments, the few extra bytes does not matter but for constrained radio technologies optimization on byte level makes sense. The overall message overhead for transmitting as well as receiving has an impact on power consumption and battery consumption. Performance can be significantly improved by messages fit into radio frame sizes. Recently we discussed with people from the aviation industry in the context of drones confirming that every byte counts.

We have been adding these exceptions one at a time. If there is a perceived complexity here we can discuss the encoding. Currently it does not seem to me that it complicates processing significantly, do you think so?

If this is not sufficiently well motivated in the draft we should add explanatory text.

xipki commented 12 months ago

Thank you for the detailed explanation.

What about another solution direction for the encoding of name:

Add new attribute type for Hex-encoded CN (no prefix 0x00 is required)
Add new attribute type for EUI-64/48 (no prefix 0x01 is required)
If the name contains only one attribute, then the CBOR Array for attributes is ignored (to save 1 byte). For example, SerialNumber=123 is not encoded as 81:82:03:63:31:32:33, but as 82:03:63:31:32:33

gselander commented 11 months ago

Add new attribute type for Hex-encoded CN (no prefix 0x00 is required)

As you noted, this is a trade-off between structure and overhead savings.

For CN = 0x6c4a the encoding in -07 is h'006c4a' which copied to cbor.me renders as 4 bytes: 43 # bytes(3) 006C4A # "\u0000lJ"

If we add a new attribute type, say 15 which is 1-byte CBOR, then for CN = 0x6c4a the encoding would [15, h'6c4a'] which copied to cbor.me renders as 5 bytes: 82 # array(2) 0F # unsigned(15) 42 # bytes(2) 6C4A # "lJ"

I can agree this structure is nicer, but the first alternative is OK IMHO, and smaller. What do others say?

Add new attribute type for EUI-64/48 (no prefix 0x01 is required)

Same thing here. Encoding in version -07, h'010123456789AB, is 8 bytes. Encoding as [16, h'0123456789AB'] is 9 bytes. (Again, the number 16 is just another 1-byte CBOR example, different from 15.)

If the name contains only one attribute, then the CBOR Array for attributes is ignored (to save 1 byte). For example, SerialNumber=123 is not encoded as 81:82:03:63:31:32:33, but as 82:03:63:31:32:33

The CDDL for Name is in Section 3.1:

Name = [ * RelativeDistinguishedName ] / text / bytes

RelativeDistinguishedName = Attribute / [ 2* Attribute ]

Attribute = ( attributeType: int, attributeValue: text ) // ( attributeType: ~oid, attributeValue: bytes )

Note that ordinary brackets () indicate CBOR sequence and square brackets [] indicate CBOR array. Hence a single attribute is encoded as [3, "123"] which copied to cbor.me renders as 6 bytes: 82 # array(2) 03 # unsigned(3) 63 # text(3) 313233 # "123"

just as you wanted! For example of multiple attributes, see Section A.3.1.

Does it make sense?

emanjon commented 11 months ago

I think Göran's calculations above are correct. Good to mention that they are encoding of Name Name = [ * RelativeDistinguishedName ] / text / bytes

If we add a new attribute type, say 15 which is 1-byte CBOR, then for CN = 0x6c4a the encoding would
[15, h'6c4a']
which copied to cbor.me renders as 5 bytes:

I think this might give a bit too positive view of the overhead. We might run out of 1 byte encodings which would mean two byte encodings.

For completeness, the suggested change would be

Name = [ * RelativeDistinguishedName ] 

RelativeDistinguishedName = Attribute / [ 2* Attribute ]

Attribute = ( attributeType: int, attributeValue: text ) //
            ( attributeType: ~oid, attributeValue: bytes )

xipki commented 11 months ago

Due to the fact that almost all types up to 23 have been assigned, if new type is used for the hex-encoded or EUI-64//48, we need 2 more bytes. For the aspect of size, I agree -07 solution is better, however, it makes the encoding more complex.

If the authors are happy with the current encoding, please close this issue. It is fine for me. Could you please add some background text to the draft for better understanding of other readers?

gselander commented 11 months ago

Could you please add some background text to the draft for better understanding of other readers?

I added some text in 5adc8678.

xipki commented 11 months ago

Thanks!

cose-wg / CBOR-certificates

Discussion: Simpler Encoding of attributes #145