Cyphrme / Coze

Coze is a cryptographic JSON messaging specification.
https://cyphr.me/coze
BSD 3-Clause "New" or "Revised" License
110 stars 3 forks source link

Enforce Canonical Base 64 encoding. #18

Closed zamicol closed 1 year ago

zamicol commented 1 year ago

Playground demonstrating the issue:

There's an apparent problem with RFC 4648. There are three places base 64 representation may contain string variation:

  1. Padding
  2. Alphabet (URI unsafe or URI safe)
  3. Canonical encoding (various characters can encode to the same byte string, but there is only one canonical decoding)

What is "canonical encoding"? From the last three characters of the example tmb, "cLj8vs...XNuhOk", the values hOk and hOl may both decode to the same byte value (in Hex, 84E9) even though they are different UTF-8 values. (Example decoding hOk and hOl.) The canonical encoding is hOk

The RFC specifically addresses 1 and 2, but not really 3.

RFC 4648 advises to reject non-alphabet characters, which can include padding. I agree with this advice:

Implementations MUST reject the encoded data if it contains characters outside the base alphabet when interpreting base-encoded data, unless the specification referring to this document explicitly states otherwise. [...] Furthermore, such specifications MAY ignore the pad character, "=", treating it as non-alphabet data[.]

I don't see the RFC really address the to the third concern.

Behavior

Obviously non-"strict"/non-canonical base 64 encoding is incorrect, and any encoder producing non-strict encoding should be fixed. However the question is what should Coze specify regarding non-strict encoding/decoding? Both Go and Javascript are permissive when decoding and do not throw errors.

Ultimately, the concern is different base 64 encoders/decoders may have different behavior. Ideally, Coze should specify the appropriate behavior for Coze. Section 3.5 mentions non-canonical encoding in the context of unpadded data and this issues is unrelated to padding (hOk= and hOl=, both padded, have the same issue as unpadded strings).

The concern is that if a Coze implementation used string comparison instead of byte comparison, this could result implementations disagreeing about valid messages. For example, with a non-strict tmb encoded string, if a Coze implementation checks tmb before cryptographic verification, it may check this based on the string value or the byte value, and comparing the string value or the byte value will result in different behavior.

Another note for any Coze restriction on encoding: JSON is base 64 unaware, any sort of Coze specified enforcement of base 64 encoding can only be applied to Coze known fields with type b64ut, and cannot be applied generally to any b64ut value.

Solutions

There appears to be only two options to handle this:

  1. Be permissive on inbound encoding, force strict outbound encoding.
  2. Force strict encoding and decoding. (This can only be done when type is known to be b64ut.)

2 is more conservative, but may require unnecessary checks that don't really add value. 1 has the potential to be more compatible if assuming that systems can decode permissively (other programming language's base 64 libraries decode permissively), which may be a bad assumption.

Regardless, I believe that 1 is the correct behavior here. Even if languages/system do no error on non-canonical encoding, implementing an encoding error can be implemented by re-encoding the decoded data and comparing strings.

Security Considerations

This base 64 decoding bug doesn't appear to be a structural/architectural/security concern since Coze uses the UTF-8 encoding of the string for signing and verification, however it is a interesting problem that should be known when working with RFC base 64. Concerning specifically replay attacks, signatures are still not malleable as payloads are UTF-8 encoded and the signing operation is not base 64 aware.

If Coze used the base 64 representation directly, this would be a security concern and could result in reply attacks.

Notes

It should be obvious, but this situation also applies to the URI unsafe alphabet and messages with base 64 padding, which all are interpreted as the same bytes. (My conversion tool only has "base64 as an input and not the various permutations since all variations can be known (or is irrelevant) and results in the sames decoded binary payload.

RFC 4648

I currently have errata open on one of the relevant sections.

I'm going to implement a non-canonical encoding check on Go and JS Coze.

See also the Go base64 package.

Go's base64 ignores carriage return and new line, so it is malleable, but JSON unmarshal does not, making Go Coze non-malleable. https://go.dev/play/p/X0J74F0zWVf See also the new line test in base64_test.go

zamicol commented 1 year ago

Go has an issue for this that is currently frozen (and I might re-open if this doesn't ping 'em)

https://github.com/golang/go/issues/15656

zamicol commented 1 year ago

Fixed by 98e7068.

Coze JS has an issue for this.

zamicol commented 1 year ago

Also just discovered a 2022 paper on the issue: "Base64 Malleability in Practice" https://dl.acm.org/doi/10.1145/3488932.3527284