Open tarcieri opened 3 weeks ago
As a data point for comparison, the libcrux-ml-kem
API is feature-gating the unpacked API: https://github.com/cryspen/libcrux/pull/522
The LAMPS WG at IETF seems to be heading towards seeds-only. See https://mailarchive.ietf.org/arch/msg/spasm/OxnYtr1mIzB3GejYswduSfkEIA4/ and earlier emails in the thread (later emails go off topic into RSA-land).
BoringSSL has moved to using seeds and only seeds: https://boringssl-review.googlesource.com/c/boringssl/+/70407
Is anyone actively working on this at the moment (@bifurcation)? If not, I would be happy to do so.
I have not picked this up. @supinie if you want to take a stab at a PR, I would be happy to review.
One problem is I haven't managed to find any test vectors, though a few people have claimed they are interested in working on them soon.
I've also been curious if a KDF could be leveraged to provide shorter, more secure seeds: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/1r6FnG0coiM/m/I9_Jn5lJDQAJ
Actually, the NIST key generation test vectors are already framed in terms of (d, z)
seeds.
So this PR might be as simple as making DecapsulationKey::generate_deterministic
public and not feature-conditioned. Or maybe having a public wrapper that has a 64-byte input and splits it into the two 32-byte values.
Or maybe having a public wrapper that has a 64-byte input and splits it into the two 32-byte values.
This is what I had in mind. Have we decided if this will be replacing the current API or be additional? Alternatively, we could have a feature flag to toggle between whether the public API will accept seeds or keys?
@supinie it should absolutely replace the existing API.
I guess the remaining question is the specific seed format, although multiple could be supported with the specific one inferred from length.
I was going to argue the opposite direction :) That since FIPS 203 defines both formats, we should support both.
As far as seed format, the (d, z)
approach we have now actually seems right to me, (a) because it is in tune with what FIPS 203 says [1], and (b) because it seems like any format should be parseable to obtain those values.
[1] Page 17, "The seed $(d, z)$... can be stored for later expansion"
I think it would be OK to keep the existing API under a feature-gate (possibly hazmat
) but it permits misuses which aren't possible with the seed-based API. See the BoringSSL example.
If we added the required validation checks and had from_bytes()
return Result<Self>
, that would be safe; doesn't seem like it would merit the hazmat
negging. (Cf. this comment) But I admit that I'm more in the "APIs should offer complete capabilities" camp than the "APIs should be very opinionated" camp.
Maybe a compromise could be: Repurpose to_bytes()
/ from_bytes()
to go to/from the seed, and add to_expanded_bytes()
/ from_expanded_bytes()
for the full form. With the idea that the _bytes
variants will be more obvious/attractive to developers.
As an aside: The BoringSSL example reminds me that if we're going to have from_seed()
, we should probably also have to_seed()
. Which means we'll need to carry around d
in kem::DecapsulationKey
.
Seeds provide a shorter secret which is always valid as opposed to having to be validated.
Some have suggested seeds should be the only API for instantiating an ML-KEM decapsulator: https://words.filippo.io/dispatches/ml-kem-seeds/