WICG / webpackage

Web packaging format
Other
1.23k stars 118 forks source link

Consider switching to DER-encoded ASN.1 #47

Closed jyasskin closed 5 years ago

jyasskin commented 7 years ago

I received feedback from some of our security people that DER is easier to parse securely than CBOR. I'm not 100% sure that's the right choice yet, so here are some pros and cons.

DER Pro's:

  1. Much older, with security-hardened implementations in many languages. e.g. BoringSSL includes a library for memory-safe DER parsing. CBOR has many implementations, but they aren't hardened.
  2. Prefixing with the length in bytes makes it easier to skip unneeded items. CBOR can work around this by embedding things into bytestrings.
  3. Generic parsing has only 2 cases: Primitive, where the interpretation of the bytes depends on the tag, and Constructed, where the content bytes encode a sequence of items. CBOR has 8 cases: integer, string, array, map, tag, simple value, simple value in next byte, and float. This is primarily used to skip unknown fields in structures.

CBOR Pro's:

  1. ~8 primitive types instead of 30+. We can subset ASN.1 to only use INTEGER, OCTET STRING, UTF8String, and SEQUENCE, but we can also subset CBOR to use integers, byte strings, text strings, arrays, and maps.
  2. Extensibility based on maps from strings to values is easier to understand and possibly to manage than ASN.1's extensibility based on appending to sequences. However, it's possible I've missed a better way to extend structures in ASN.1.
  3. It's easy to find the latest specification.
  4. Combining the length into the type saves a byte for smaller items.
  5. Serializing canonical CBOR, which is needed for signature checking, is significantly easier to implement.
  6. All DER integers and floats have to deal with the complexity of encoding bignums. For our purposes, where bignums aren't needed, CBOR's 8,16,32, and 64-bit integers are simpler. Even if bignums were needed, it's likely that a fixed length would be simpler to specify and implement than DER's variable-length encoding.

If we stick with CBOR, we'll mandate Canonical CBOR, which includes minimal integer encodings. That means offsets will need to measure a range that doesn't include the offset itself, but that's doable for both the section list and the resource index.

I'll update this list as more considerations come up.

lrosenthol commented 7 years ago

You also have the fact that DER has been around a lot longer and there are many implementations in many different languages. As noted, it's undergone a lot more scrutiny for security, etc.

On Thu, Apr 13, 2017 at 1:09 AM, Jeffrey Yasskin notifications@github.com wrote:

I received feedback from some of our security people that DER https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf is easier to parse securely than CBOR https://tools.ietf.org/html/rfc7049. I'm not 100% sure that's the right choice yet, so here are some pros and cons.

DER Pro's:

  1. BoringSSL includes a library for memory-safe DER parsing. We'd have to write our own for CBOR.
  2. Prefixing with the length in bytes makes it easier to skip unneeded items. CBOR can work around this by embedding everything into bytestrings.
  3. Generic parsing has only 2 cases: Primitive, where the interpretation of the bytes depends on the tag, and Constructed, where the content bytes encode a sequence of items. CBOR has 8 cases: integer, string, array, map, tag, simple value, simple value in next byte, and float.

CBOR Pro's:

  1. ~8 primitive types instead of 30+. However, I think we can subset ASN.1 to only use BOOLEAN, INTEGER, OCTET STRING, UTF8String, and SEQUENCE.
  2. Extensibility based on maps from strings to values is easier to understand and possibly to manage than ASN.1's extensibility based on appending to sequences. However, it's possible I've missed a better way to extend structures in ASN.1.
  3. It's easy to find the latest specification.
  4. Combining the length into the type saves a byte for smaller items.

Even if we stick with CBOR, I'm inclined to mandate Canonical CBOR https://tools.ietf.org/html/rfc7049#section-3.9, which includes minimal integer encodings. That means offsets will need to measure a range that doesn't include the offset itself, but I think that's doable for both the section list and the resource index.

I'll update this list as more considerations come up.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dimich-g/webpackage/issues/47, or mute the thread https://github.com/notifications/unsubscribe-auth/AE1vNayKVCul9V4IR7UQx3TT-M0eGBqRks5rva4jgaJpZM4M8Q7U .

jyasskin commented 7 years ago

@lrosenthol Incorporated.

annevk commented 6 years ago

@hsivonen mentioned that CBOR doesn't define error handling: https://tools.ietf.org/html/rfc7049#section-3.3. Does DER? I think we absolutely do not want to introduce a new format in browsers without well-defined error handling. (And forking CBOR to add error handling reduces the utility of reusing it.)

jyasskin commented 6 years ago

@annevk https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf appears to be the spec for DER, and it only specifies how encoders must encode things, not how parsers decode or handle errors.

I think the right place to specify error handling is the same places I'm currently requiring parsers to fail when an item wasn't encoded canonically. I should also be explicit that they must fail and return no data for invalid items.

davidben commented 6 years ago

(DER parsers are supposed to reject non-canonical inputs, yes. That's the whole point of DER.)

jyasskin commented 6 years ago

I'm inclined to stick with CBOR. It seems noticeably simpler than DER for both the encoder and decoder, even if I try to subset DER, and I'm not confident I understand ASN.1 well enough to subset it.

nyaxt commented 6 years ago

As an implementer, I'd also like to vote on using CBOR. The CBOR format was so simple and obvious to implement. re: error handling, I found the simple "reject any non-canonical input" to be enough to clarify the implementation. In addition, Chromium already has a fuzz-tested CBOR encoder/decoder (for prior use-case in Web Authentication).

davidben commented 6 years ago

Chromium had a fuzz-tested DER encoder/decoder for much much longer by way of X.509. :-P In an ideal world, both Web Authentication and Web Packaging would use DER, as browsers already must parse that securely and pay for in binary size, but alas, since Web Authentication already messed up, we're stuck paying for both and either is probably fine.

(I haven't looked closely at CBOR. DER has a critical nice property that it can be parsed without any allocations with a purely StringPiece-like API. If CBOR does not have this property, do not use it.)

annevk commented 6 years ago

@nyaxt it's unclear to me whether that is enough to lead to fully interoperable implementations. It also means that you cannot use any generic CBOR parser if you want interoperability.

davidben commented 6 years ago

To be fair, a lot of existing DER parsers are buggy and accept all or bits of BER as well. The good ones only accept DER, but there are a lot of bad ones.

jyasskin commented 6 years ago

@annevk The IETF is working on a CBORbis at https://github.com/cbor-wg/CBORbis. I'll send them a patch to try to tighten up the error handling. It's still likely to have more than one option, like UTF-8 does.

annevk commented 6 years ago

@jyasskin UTF-8 on the web doesn't (and note that too has been very problematic, a costly endavor, and not without security consequences).