KEM failures and API considerations

chris-wood commented 3 years ago

This issue tracks two related questions:

How should HPKE deal with implicit KEM failures? Should it raise errors early on (during context creation) or only when trying to decrypt data?
Not all KEMs have an immediate "is this key valid" check, such as those based on lattices. How should the spec accommodate those?

Currently, context creation is fallible, which would need to change if the answer to both (1) and (2) is "only fail during the AEAD operation" (I think).

cc @davidben, @martinthomson, @jedisct1, @bifurcation, @rozbb (please feel free to tag other implementers!)

beurdouche commented 3 years ago

As discussed privately, my intuition is that dealing with implicit KEM failures only at decryption time might be simpler for applications anyway at the cost of the decryption attempt. Plus it sounds like current security proofs handle only perfectly correct KEMs outside of the DHKEM variant atm (@blipp can correct me if I am wrong).

rozbb commented 3 years ago

Are there any KEMs which have explicit Decap failures? We have the X25519 zero-check, but plenty of people think it's not necessary, so I could imagine removing it. I just glanced at Kyber, Frodo, SIKE, NTRU, McElice, and SABER, and all of them return a non-null value on Decaps (notably, the last 3 do have failure conditions, but they output something pseudorandom when that condition is reached).

As for implicit failures, this is something that we have plenty of already. For example, X25519AuthDecap currently produces the wrong symmetric key if given the wrong sender pubkey, and this is only detectable at the AEAD stage. We could theoretically push the error to the Decap stage by including a test payload with every encapped key. This would perhaps let users catch Auth/PSK errors more easily, and also detect implicit failures due to (extremely low probability) correctness errors like in FrodoKEM. The downside is that this would increase the size of encapsulated keys by 112-150%.

davidben commented 3 years ago

As an implementor, I don't think moving the failure makes sense. I suppose an implementation could move it if it really wanted to, but it doesn't seem like a good idea. (While the spec does specify operations, how exactly that manifests in your programming language is necessarily an implementation question. There are many considerations around the type system, how you want to represent HPKE ciphers, naming, etc.)

First, I don't think the X25519 zero-check is that controversial. TLS mandates it, so I expect any TLS-adjacent implementation to use it. It should stay. HPKE also supports P-256, where both point-at-infinity and point-on-curve checks are quite well-established. (And necessary! Missing the point-on-curve check has the usual implications with uncompressed coordinates. Even with compressed coordinates, not every compressed input successfully finds a square root. Missing the point-at-infinity check means the output secret is not even defined.)

Second, Encap and Decap can fail for every KEM. Even ignoring failures inside the KEM, most of our programming languages do not have sufficiently strong type systems to reject keys and enc values of the wrong length at the type level. Even where they are strong enough, that's just shifting the check to the caller. At the end of the day, your protocol probably sends a variable-length string somewhere. Or the KEM may need to serialize something in a way that doesn't span the whole [N]byte space.

Finally, you'd need to shift the failure to not just decryption, but also encryption (if the recipient public key is wrong) and export. Those do not have as natural of error cases as decryption, so shifting the error seems more error-prone.

cfrg / draft-irtf-cfrg-hpke

KEM failures and API considerations #221