Key derivation API - Githubissues

LoupVaillant / Monocypher

An easy to use, easy to deploy crypto library

https://monocypher.org

Other

580 stars 80 forks source link

Key derivation API #255

Closed samuel-lucas6 closed 1 year ago

samuel-lucas6 commented 1 year ago

I would argue one of the weakest links in libsodium and Monocypher is the lack of a proper key derivation API for high-entropy keys and shared secrets.

The issue with HChaCha20 is that it leads to problems like #230, can only produce a fixed output, and can't be personalised without doing something akin to XCKDF.

libsodium's new API is limited to an 8 character context string, a uint64 counter for a salt, and can only produce an output up to 512 bits. The old API requires a 128-bit salt, a 128-bit context string, and can again only produce up to 512 bits.

BLAKE2X isn't compatible with BLAKE2b for small output sizes and is hardly used. The BLAKE3 crate has a derive_key mode that allows different context lengths but doesn't take a salt. There doesn't seem to be agreement on how to use HKDF properly. Then there are NIST KDFs that I presume few people use due to HKDF.

libsodium is now adding HKDF, although this has taken a while to come about. It was a feature request in 2016.

How would you feel about adding a BLAKE2b implementation of either HKDF or something similar to HKDF? I would argue for the following requirements:

A mandatory context string for domain separation.
Variable context size like BLAKE3. Allows "[application] [commit timestamp] [purpose]" and short strings without having to manually pad.
An optional, variable length salt like HKDF whilst avoiding this.
No key prehashing, unlike HMAC. Although I doubt this is a problem with HKDF because nobody would use such a large salt. The salt could be capped at 256 or 512 bits to match pre-shared key sizes/BLAKE2b.
Support for larger output lengths than 512 bits.
Extract-then-Expand like HKDF because of shared secrets. Maybe merged into one function to avoid user confusion.

Bulat-Ziganshin commented 1 year ago

Standard modern key-derivation functions are Argon2, isn't it?

samuel-lucas6 commented 1 year ago

Standard modern key-derivation functions are Argon2, isn't it?

For password-based key derivation, but typically not for high-entropy keys/shared secrets because no delay is required and you want domain separation via context information. HKDF is the norm but uses HMAC-SHA-2. BLAKE2X was the BLAKE2 attempt at HKDF, but it didn't go far.

LoupVaillant commented 1 year ago

It looks like Argon2 has the kind of API we need indeed. It would be tempted to remove all the memory hard stuff in the middle, and only keep the Blake2b part. We could even hijack the current Argon2 API for this, though it would be awkward to remove the very thing that makes Argon2 what it is.

To be honest though, I'm not sure this is even needed: Blake2b already does almost everything we need. It's keyed mode even neatly separates the key from the rest of the hash, providing a nice mitigation against some energy side channel attacks. Implementing what you want @samuel-lucas6 sounds easy enough:

#include <monocypher.h>

#define MAX(a, b) ((a) > (b) ? (a) : (b))

static void store32_le(uint8_t out[4], uint32_t in)
{
    out[0] =  in        & 0xff;
    out[1] = (in >>  8) & 0xff;
    out[2] = (in >> 16) & 0xff;
    out[3] = (in >> 24) & 0xff;
}

void kdf(uint8_t       *hash, uint32_t hash_size,
         const uint8_t *key,  uint32_t key_size,
         const uint8_t *dom,  uint32_t dom_size,
         const uint8_t *salt, uint32_t salt_size)
{
    crypto_blake2b_config config = {
        .key       = key,
        .key_size  = key_size,
        .hash_size = MAX(hash_size, 64),
    };
    crypto_blake2b_ctx ctx;
    crypto_blake2b_init  (&ctx, config);
    uint8_t sizes[16];                 // Hashing all sizes may be overkill
    store32_le(sizes + 0, hash_size);  // Already hashed in Blake2b
    store32_le(sizes + 0, key_size);   // Key is already isolated
    store32_le(sizes + 0, dom_size);   // Needed to separate dom & salt
    store32_le(sizes + 0, salt_size);  // Last one, not really needed
    crypto_blake2b_update(&ctx, sizes, sizeof(sizes));
    crypto_blake2b_update(&ctx, dom  , dom_size);
    crypto_blake2b_update(&ctx, salt , salt_size);
    crypto_blake2b_final (&ctx, hash);

    // If hash is too big,
    // Use ChaCha20 to stretch the hash we already have
    // Input key is the 64 byte hash we already have.
    // Nonce is taken from the same hash, but it could be zeroes.
    // Counter starts at zero just because.
    // Note the NULL pointer for the input encrypted text,
    // It causes the API to interpret is as all zeroes.
    if (hash_size > 64) {
        crypto_chacha20_djb(hash, 0, hash_size, hash, hash + 32, 0);
    }
}

Okay, I have to admit it's not that easy. When I wrote this thing I noticed two subtle traps users may fall into:

Forgetting to properly separating the dom string and the salt, so there's a difference between ("dom123", "salt"), and ("dom", "123salt"). We need to hash the size of dom to do that. Here I went overboard with the sizes, but users might easily forget sizes altogether.
Expanding the final hash with ChaCha20 only works when your small hash is already a key. Works when you do proper KDF, but (i) doesn't quite work if you just wanted to hash stuff, and (ii) we now depend on two primitives instead of just one.

There may be a way to remove some of those obvious steps with two tweaks to Monocypher's API:

We could include lift the limit hash size in Blake2b(), and expand the hash like HKDF if the hash size exceed 64 bytes.
We could provide a special update routine that hash the size of the input alongside the input itself, to ensure proper input separation. Not sure how much we'd need that, since users can do the same already.

in any case, here's the result:

void crypto_blake2b_fenced_update(crypto_blake2b_ctx *ctx,
                                  const uint8_t *buf, uint32_t buf_size)
{
    uint8_t size[4];
    store32_le(size, buf_size);
    crypto_blake2b_update(ctx, size, sizeof(size));
    crypto_blake2b_update(ctx, buf, (size_t)buf_size);
    crypto_wipe(size, sizeof(size));
}

void kdf2(uint8_t       *hash, uint32_t hash_size,
          const uint8_t *key,  uint32_t key_size,
          const uint8_t *dom,  uint32_t dom_size,
          const uint8_t *salt, uint32_t salt_size)
{
    crypto_blake2b_config config = {
        .key       = key,
        .key_size  = key_size,
        .hash_size = hash_size, // .hash_size may exceed 64, yay!
    };
    crypto_blake2b_ctx ctx;
    crypto_blake2b_init         (&ctx, config);
    crypto_blake2b_fenced_update(&ctx, dom , dom_size);
    crypto_blake2b_fenced_update(&ctx, salt, salt_size);
    crypto_blake2b_final        (&ctx, hash);
}

There: much easier, much simpler, much more obvious, much less room for error.

Also, honest question: do we even need domain separation? I understand the need when all KDF inputs are long term secrets, but:

If you have a random 128-bit salt (or bigger), no collision is possible.
If the key material includes at least one ephemeral key, no collision is possible.

I daresay the above is the common case: forward secrecy mandates the use of ephemeral keys, and if we don't we need a random nonce/salt somewhere. We would no longer need the domain string, nor the associated fenced_update(). And even if your input key material is bigger than 64 bytes, we can always hash it first. So:

void kdf3(uint8_t       *hash, uint32_t hash_size,
          const uint8_t *ikm,  uint32_t ikm_size,
          const uint8_t *salt, uint32_t salt_size)
{
    uint8_t key[64];
    size_t  key_size = MIN(ikm_size, 64);
    if (key_size > 64) {
        crypto_blake2b(key, crypto_blake2b_defaults, ikm, ikm_size);
    } else {
        memcpy(key, ikm, ikm_size);
    }
    crypto_blake2b_config config = {
        .key       = key,
        .key_size  = key_size,
        .hash_size = hash_size, // .hash_size may still exceed 64
    };
    crypto_blake2b(hash, config, salt, salt_size);
}

Now this is starting to actually look like a straightforward application of Blake2b's keyed mode. And the complicated bit at the beginning where we reduce the input key material down to something small enough to be a key is only needed if the key is too large and users actually need the side channel protection (and really, most won't care about the energy side channel).

Yet another possibility would be this:

void kdf4(uint8_t       *hash, uint32_t hash_size,
          const uint8_t *ikm,  uint32_t ikm_size,
          const uint8_t *salt, uint32_t salt_size)
{
    crypto_blake2b_config config = {
        .key       = 0,
        .key_size  = 0,
        .hash_size = hash_size, // .hash_size may still exceed 64
    };
    crypto_blake2b_ctx ctx;
    crypto_blake2b_init  (&ctx, config);
    crypto_blake2b_update(&ctx, ikm  , ikm_size);
    crypto_blake2b_update(&ctx, salt , salt_size);
    crypto_blake2b_final (&ctx, hash);
}

Most of the time there will bee no need to separate the ikm input key material from the salt, because they will all have fixed sizes, and salt is random anyway so we're not worried that moving boundaries may induce collisions. So again, a straightforward application of Blake2b.

One thing I'm willing to concede here, because damn that's convenient, is allowing the hash size to grow beyond 64 bytes:

The SLOC cost should be minimal.
Removing the maximum size limit simplifies the API.
It would give extract-expand capabilities to Blake2b itself.

I'm not sure we need a dedicated KDF routine on top of this. Users needing a KDF will have relatively little room to shoot themselves in the foot, and Monocypher is a low-level cryptographic library to begin with. So are NaCl and Libsodium: the only way to remove all the footguns here (besides the C language itself), is to implement high-level protocols, such as authenticated key exchange, PAKE (symmetric and augmented), PURB, etc.

Right now this high-level stuff is out of scope, and belong to separate projects. I try to remove the footguns where appropriate, most notably high level primitives like EdDSA, or universally needed constructions like authenticated encryption, but for anything else I'd rather move slowly and avoid scope creep.

LoupVaillant commented 1 year ago

I just realised, reading Soatok' post, that I don't know whether Blake2b should qualify as a (strong) PRF or a KDF. Specifically, to think of keyed Blake2b as a proper random oracle:

Is it enough that it's key has enough entropy?
Or do we need that key to be uniformly random in the first place?

Internally Blake2b hashes the key in a separate block, so its stand to reason that this internal hash is already uniformly random if the key has enough entropy. Thus, the rest of Blake2b only needs to be a PRF for the keyed mode to satisfy the KDF requirements.

It seem even possible that's why the key is separated in the first place. If it was merely concatenated to the rest of the input that may make security analysis more difficult. Though I confess I have no idea why it's limited to 64 bytes instead of the size of a full Blake2b block (128 bytes). In any case, if we need to separate the key from the other inputs for Blake2b to count as a KDF, my kdf4() does not quite work.

Now if Blake2b in keyed mode does not count as a KDF, I don't know what to do to be honest. We could implement HMAC and HKDF on top of it, but… man, what a waste.

samuel-lucas6 commented 1 year ago

Forgetting to properly separating the dom string and the salt, so there's a difference between ("dom123", "salt"), and ("dom", "123salt"). We need to hash the size of dom to do that. Here I went overboard with the sizes, but users might easily forget sizes altogether.

Indeed, that's one primary motivation for this.

Expanding the final hash with ChaCha20 only works when your small hash is already a key. Works when you do proper KDF, but (i) doesn't quite work if you just wanted to hash stuff, and (ii) we now depend on two primitives instead of just one.

Yes, I've also seen Frank suggest that strategy for an XOF, but I think it's preferable just using BLAKE2b.

We could provide a special update routine that hash the size of the input alongside the input itself, to ensure proper input separation. Not sure how much we'd need that, since users can do the same already.

That's a very interesting idea.

Also, honest question: do we even need domain separation?

I think we do. The classic example is deriving two subkeys from a master key (e.g. for Encrypt-then-MAC). This came up on Cryptography Stack Exchange today. The context string may be the only thing that prevents the same output.

It's also obviously good practice. Frank argues for context separation in his API design for cryptography talk if you haven't seen it, and he implemented this heavily in LibHydrogen. It seems more common than using a salt with HKDF, which is I believe what inspired BLAKE3 to not have a salt. The new libsodium KDF also swapped the salt for a counter as if people weren't using salts.

Now this is starting to actually look like a straightforward application of Blake2b's keyed mode. And the complicated bit at the beginning where we reduce the input key material down to something small enough to be a key is only needed if the key is too large and users actually need the side channel protection (and really, most won't care about the energy side channel).

What's this energy side channel? Sounds intriguing.

I'm not sure we need a dedicated KDF routine on top of this... for anything else I'd rather move slowly and avoid scope creep.

I understand, but I think a KDF is within scope without creeping. Key derivation is something you're pretty much guaranteed to bump into.

I just realised, reading Soatok' post, that I don't know whether Blake2b should qualify as a (strong) PRF or a KDF.

Frank's API seems to suggest it's suitable as a KDF, but the website and specification don't mention KDF/key derivation, merely MAC/PRF. That could just be due to the key/digest length restrictions though.

However, BLAKE2X seems to imitate HKDF, except the B2 instances aren't keyed:

H0 = BLAKE2b(key: optional, message: ikm)
okm = B2(node offset: 0, 64, H0) || B2(1, 64, H0) || B2(2, 64, H0) || B2(length / 64, length mod 64, H0)

If we ignore the different parameter block, this is similar to what I naively envisioned. It's basically swapping HMAC with keyed BLAKE2b in HKDF:

prk = BLAKE2b(key: salt or hashLength zeros, message: ikm)
okm = BLAKE2b(key: prk, message: empty || info || 0x01) || BLAKE2b(key: prk, message: previousHash || info || 0x02)

Perhaps no large key prehashing would be required here because it could go in the message ikm rather than prk. The salt doesn't need to be more than 512 bits so no prehashing there, unlike HMAC. This doesn't address Soatok putting the salt in the info though.

But, as you say, I don't know if the extract step is needed for KDF security and whether BLAKE2b is as suitable of a computational randomness extractor as HMAC is. None of this probably matters, with KDF vs PRF likely being academic. If that's the case, kdf2 seems good.

LoupVaillant commented 1 year ago

OK, a quick web search suggests that Blake2b is a prober KDF. It may be prudent to use the provided key as originally designed, but the authors describe it as being a Random Oracle, so it's not clear we even need to. I'll need to ask around.

As for the hash expansion, I was thinking I'd do something like the following:

// Assuming size > 64
void expand_hash(uint8_t expanded, size_t size, uint8_t hash[64])
{
    // Prepare block:
    // [0 .. 7] -> counter
    // [8 ..15] -> total size
    // [16..79] -> hash to expand
    //
    // The block is 128 bytes exactly because that's easier to optimise.
    uint8_t block[128] = {0};
    memcpy(block + 16, hash, 64);
    store64_le(block + 8, (uint64_t)size);

    // Fill the final hash with hashed blocks (in counter mode).
    u64 ctr = 0;
    while (size > 0) {
        store64_le(block, ctr);
        crypto_blake2b_config config = {
            .key       = 0,
            .key_size  = 0,
            .hash_size = MAX(size, 64)
        };
        crypto_blake2b(hash, config, block, 128);
        hash += config.hash_size;
        size -= config.hash_size;
        ctr++;
    }
}

_(The actual patch instead expands the core interface, so we can call crypto_blake2b() with bigger hash sizes right away.)_

The main ideas are to minimise the number of hash calls and make sure the hash changes entirely when we change its length (shorter hashes aren't prefixes of longer hashes).

Note that there's a fundamental problem of domain separation: extended hashes can be computed in terms of short hashes using only the public API, so we can in a contrived scenario conceive a short hash that's a subset of a long one. ~~(The same is true about the key: we don't need a special API to implement keyed mode, we just need to add a 128 byte block at the beginning, filled with the key and completed with zeroes.)~~

But this is the price of compatibility. Unless there's real value (like with Elligator), I don't want to add special features users can't emulate with other libraries. This keeps things somewhat interoperable, and reduces vendor lock-in. (Or… Trussst meee, ussse my ssstufff!)

@fscoto do you have an opinion on this? Does this sound like the kind of thing we should add to Monocypher, or should we hold off?

LoupVaillant commented 1 year ago

We could provide a special update routine that hash the size of the input alongside the input itself, to ensure proper input separation. Not sure how much we'd need that, since users can do the same already.

That's a very interesting idea.

And one I'm now mostly rejecting, because users would have to think of using the fenced update.

Also, honest question: do we even need domain separation?

I think we do. The classic example is deriving two subkeys from a master key (e.g. for Encrypt-then-MAC).

Wait no, this particular use case calls for an expand step, not for domain separation. You can fake the expand step with using a KDF twice with a different context, but you don't need to if your KDF API already outputs as many bytes as you want.

It's also obviously good practice. Frank argues for context separation in his API design for cryptography talk if you haven't seen it, and he implemented this heavily in LibHydrogen. It seems more common than using a salt with HKDF, which is I believe what inspired BLAKE3 to not have a salt. The new libsodium KDF also swapped the salt for a counter as if people weren't using salts.

Didn't know about the talk, thanks. <watching it…>

Unfortunately I have to say much of it is incompatible with Monocypher's core design constraints. First, Monocypher will never provide nonces or automatic key pair generation. Pure C with zero dependency, that's flat out impossible. Users need to pull random numbers out of their… hem, OS.

A second thing that's hard to do is the high-level authenticated key exchange. I could in theory implement Noise, but then which pattern must I use? XX I guess, since it's the most useful, but KX has interesting security properties, and IK reduces the number of messages (at the expense of initiator security). And not implementing all interesting patterns, that's too big of a project, and users still need to chose one somehow. Which is why so far I have punted on the whole key exchange thing, including key derivation.

Now I realise there are two things I could do that make sense even if I reject the KDF itself:

Remove the limitation on the input key size. (extract phase)
Remove the limitation on the output hash size. (expand phase)

That's probably all most users will even need. Stuff the IKM in the key, the domain separation string in the message, output the hash, done. Now the only use case left for a separate KDF function is if we have at least three inputs: key and salt and domain separation. I'm not sure this will come up as often, if at all.

What's this energy side channel? Sounds intriguing.

@fscoto once showed me a paper about mitigating the energy side channel in signatures. The way they did it was at the hash step: they made sure the key material was in its own block, completely separated from attacker controlled input. (Hmm, maybe it was about fault injection? I don't remember). Anyway, when we get to the point where we're processing attacker controlled input, the key has already been mangled by the hash's internal compression function, making any analysis much harder.

Now that I think about it though, there may be other reasons. I'm asking on Reddit right now.

I'm not sure we need a dedicated KDF routine on top of this... for anything else I'd rather move slowly and avoid scope creep.

I understand, but I think a KDF is within scope without creeping. Key derivation is something you're pretty much guaranteed to bump into.

Sure, it's a necessary component of many protocols. But if we get to that point, users likely need a turnkey solution and those involve way too many decisions to be including in Monocypher itself. That's why I have separate projects (unfortunately in limbo right now). So I need to ask about the uses of KDF outside of authenticated key exchange.

Perhaps no large key prehashing would be required here because it could go in the message ikm rather than prk. The salt doesn't need to be more than 512 bits so no prehashing there, unlike HMAC. This doesn't address Soatok putting the salt in the info though.

I'm not sure, I suspect this one is specific to HKDF.

But, as you say, I don't know if the extract step is needed for KDF security and whether BLAKE2b is as suitable of a computational randomness extractor as HMAC is. None of this probably matters, with KDF vs PRF likely being academic. If that's the case, kdf2 seems good.

Well, as I said in my earlier comment, I believe Blake2b is a proper KDF, or at least is advertised as such. I mean if it's good enough to pass for a Random Oracle, it has to be good enough for a KDF.

The distinction between KDF and PRF however probably isn't as academic as one might think: there's a chance, though I'm not certain, that one key difference is that a KDF would work even if part of the input key material is under attacker control. All we need for it to work is enough entropy unknown to the attacker. If there was no difference, I would use HChacha20 as a compression function and implement a Noise-like key exchange with that. A key exchange that doesn't need a dedicated hash, how cool is that?

Unfortunately I cannot. Heck, even XCKDF is kind of sketchy, with the assumption that the HChaCha20 of an X25519 shared secret is random enough.

Going back to Libsodium's current API, it's actually quite clever: they're splitting up the expand phase into several function calls, so you can derive as many keys as you want without eating up all your memory. This sounds quite useful if you're in a situation if you actually need to derive a significant (or arbitrary) number of keys.

Most of the size limitations make sense too: if the key size wasn't limited you'd need to hash the input key material every time instead of just shoving it into the Blake2b key. The extract phase needs to be handled by users themselves, but that one isn't error prone at all: just hash all the things, that's your key.

The output limitation also makes sense: since your expand phase is already divided into several functions there's no need to allow for a bigger hash. Just call your function again, with a different number.

Limiting the domain separation string to merely 8 bytes however seems a bit short. I would try to limit that to 64 bytes instead, so users can hash longer domain separation strings without risking collisions. But I also see that Libsodium is using a fixed size domain separation string, so I guess it does make sense to keep it short, so users aren't forced to fill all 64 bytes every time.

Now I haven't looked at it, but I suspect it uses Blake2B's keyed mode under the hood. It's likely then that the cost they have to pay for their nice multi-call API is 2 compressions per 64-byte output, while my example needs only one… wait that's not quite true: I need one per output, plus two. I'm only better after 2 full outputs (in practice this means up to 4 sub-keys). Anyway, we could have a less costly multi-call API by explicitly separating the extract and expand phases with crypto_kdf_extract() and crypto_kdf_expand() functions. And while we're at it we could add a streaming interface for the extract phase (crypto_kdf_extract_init(), crypto_kdf_extract_update(), crypto_kdf_extract_final())…

You know what, I just cut down the EdDSA streaming interface, I'm not sure I want to go that overboard for key derivation, which needs streaming interfaces even less than signatures. Let's not get carried away.

fscoto commented 1 year ago

A lot of virtual ink has been spilled on this issue. I'll keep my input here brief.

Monocypher is predominantly used in embedded devices, which will need to interoperate with other systems. Standards tick checkboxes. I suspect that the amount of people who want to reimplement some technically-possible-but-not-common construction on their server with libsodium or something along those lines will tend towards zero. BLAKE2X (designed as a KDF among other things) has found approximately no adoption.

Therefore, +1 for boring ol' HKDF with HMAC from me, even if HMAC is silly and (mostly) meaningless with BLAKE2 in the first place.

Laczen commented 1 year ago

Agreed that standards should be used, looking at tls1.3 would it not be a good choice to add sha256, sha256_hmac and hkdf_sha256_extract/expand ?

LoupVaillant commented 1 year ago

No. Adding SHA-256 (or any other hash, even Blake2s) would be going too far.

Monocypher has one main hash, that's Blake2b. It has a keyed mode already so no need for HMAC. I'm still wondering whether we need KDF at all (right now I'm banging my head over the exact security properties of BLAKE2b).

Monocypher has one optional hash, strictly for compatibility with Ed25519 (which I needed for my tests), that's SHA-512. And since it does not have a keyed mode, I have added HMAC. It probably needs HKDF for completeness.

That's it. And honestly in my opinion, two hashes are already one too many. Let's not add even more.

samuel-lucas6 commented 1 year ago

Wait no, this particular use case calls for an expand step, not for domain separation. You can fake the expand step with using a KDF twice with a different context, but you don't need to if your KDF API already outputs as many bytes as you want.

Yeah, I've seen both ways. The former can be tidier code wise as you avoid all the array slicing.

That's probably all most users will even need. Stuff the IKM in the key, the domain separation string in the message, output the hash, done. Now the only use case left for a separate KDF function is if we have at least three inputs: key and salt and domain separation. I'm not sure this will come up as often, if at all.

Maybe. If we look at age, no salt is used for the header key but domain separation in info is. A nonce is used as a salt for the payload key and more info domain separation. Then the wrap key uses an ephemeral shared secret and recipient public key(?) as a salt and info domain separation.

Signal uses info more than the salt. TLS 1.3 uses the salt and info. Noise uses the salt but no info.

Kryptor uses the old libsodium API, with an application-specific string for personalisation, a random salt, and the ephemeral public key bytes/random data in the message. My use case for the salt/message was to authenticate parts of the file only used for specific purposes (e.g. the salt is only really needed for password-based encryption but private key based encryption is possible).

Anyway, when we get to the point where we're processing attacker controlled input, the key has already been mangled by the hash's internal compression function, making any analysis much harder.

That makes sense, thanks.

Now that I think about it though, there may be other reasons. I'm asking on Reddit right now.

That's interesting. I knew BLAKE2b was fine with prefix-MAC, but I never considered using it instead.

I mean if it's good enough to pass for a Random Oracle, it has to be good enough for a KDF.

I'm surprised KDF was never explicitly stated. The HKDF paper, which I've never read properly, suggests to me BLAKE2b is a good extractor if keyed.

Unfortunately I cannot. Heck, even XCKDF is kind of sketchy, with the assumption that the HChaCha20 of an X25519 shared secret is random enough.

To be fair, Bernstein did the same in NaCl for whatever reason.

Most of the size limitations make sense too: if the key size wasn't limited you'd need to hash the input key material every time instead of just shoving it into the Blake2b key. The extract phase needs to be handled by users themselves, but that one isn't error prone at all: just hash all the things, that's your key.

What annoyed me was implementing pre-shared key support. I couldn't put it in the key slot because it was already full due to two concatenated shared secrets. More and variable context and salt sizes would be ideal.

Now I haven't looked at it, but I suspect it uses Blake2B's keyed mode under the hood.

Yes, it does alongside the optional, efficient personalisation and salt parameters in BLAKE2b, which is why they're fixed in size.

Therefore, +1 for boring ol' HKDF with HMAC from me, even if HMAC is silly and (mostly) meaningless with BLAKE2 in the first place.

This is fair enough, and Noise uses HMAC with BLAKE2. The performance penalty is a shame though. HKDF with SHA-512 would be fine. I'm not sure why people use HKDF-SHA-256 instead of HKDF-SHA-512.

Laczen commented 1 year ago

To get more usage of monocypher on whatever platform it should be possible to follow existing standards. Therefore it should support sha512 for ed25519 (as it does), for tls1.3 it should also support sha256, for others it should be blake2b. The hash function used could default be blake2b, but adding others should be simple and encouraged.

LoupVaillant commented 1 year ago

To get more usage of monocypher on whatever platform it should be possible to follow existing standards.

Popularity is not a goal, I just want to make something useful. Following standards helps make sure my work is secure and useful, but only to a point. Staying small and focused also makes Monocypher easier to learn and integrate, and that's useful too.

The hash function used could default be blake2b, but adding others should be simple and encouraged.

I have very limited capacity. I'm not alone but I do shoulder over 80% of the work. Everything we add takes time to test and maintain, time I could use to do something else, like implementing a high-level protocols. (Hint: not TLS.)

LoupVaillant commented 1 year ago

I mean if it's good enough to pass for a Random Oracle, it has to be good enough for a KDF.

I'm surprised KDF was never explicitly stated. The HKDF paper, which I've never read properly, suggests to me BLAKE2b is a good extractor if keyed.

Okay, I just read that paper, I should have done that sooner. The information I got from it was golden:

HKDF uses HMAC both as an extractor and as a PRF. But the fact it can do both does not mean it can do them at the same time:
- As an extractor, we give HMAC some input key material of dubious randomness, and we extract one uniform pseudo-random key.
- As a PRF, we give HMAC a uniform pseudo-random key and a counter.
We do not give HMAC input key material of dubious randomness and a counter, then expect it to sprout as many uniformly random independent outputs. For this to work requires stronger security assumptions than the extract-expand scheme ever needs. Corollary: BLAKE2b should follow the same constraints, and the extract & expand phases systematically separated. There goes my clever "remove limitations" idea: removing limitations on the hash size would just encourage that kind of misuse.
There's a reason (p22) why the BLAKE2b key is kept in its own block: it's common practice so we get a random IV before we hash the actual input. My guess here is that it simplifies (or even enables) security reductions. So when using Blake2 in the expand phase, we have to hash two blocks for every 64 output bytes.

Fun fact I just realised: Libsodium's new KDF API actually describes a PRF. From the look of the source code they use the input key and a counter directly, which is exactly what we need for the expand phase, but the input key has to be fully random unless we want to make additional assumptions about BLAKE2b. I believe the manual doesn't disagree with me here, so I'm inclined to believe "KDF" was just small misnomer (we often say "KDF" when we just mean "key stretching").
I wondered why the key size was limited, but looking at the implementation it became obvious: the internal state is a 64-byte hash (which starts as the initial IV). So even though the key could in theory be bigger, we'd only retain 64 bytes of entropy from it, so there's no point making it any bigger.

Unfortunately I cannot. Heck, even XCKDF is kind of sketchy, with the assumption that the HChaCha20 of an X25519 shared secret is random enough.

To be fair, Bernstein did the same in NaCl for whatever reason.

I made the assumption only because Bernstein did the same with HSalsa20. As for his reasons, my guess is that it helps him implement crypto_box() with the least number of primitives: if there's no need for a proper hash the code is overall tighter.

What annoyed me was implementing pre-shared key support. I couldn't put it in the key slot because it was already full due to two concatenated shared secrets. More and variable context and salt sizes would be ideal.

Recall that in the case of HMAC, the key is used for the salt, not the input key material. The input key material is compressed in the message. So the proper place for your input key material is not the key but the message. This one has unlimited size, so you're good here.

I'm not sure why people use HKDF-SHA-256 instead of HKDF-SHA-512.

SHA-256 is almost certainly faster on short inputs (less than 64 bytes, where a single compression is enough).

Tentative conclusions

With what I've learned today, I would say: Do nothing: we already have KDF.

BLAKE2b is a suitable extractor, and a perfect 1:1 mapping with HKDF-extract: shove the input key material in the message argument, and if you need a random salt, put it in the optional key argument. And it has a streaming interface out of the box, so you can easily extract from as many inputs as you need. No need to do anything there.
BLAKE2b is a suitable PRF. It was designed to be. Just put the (possibly extracted) key in the key argument, then concatenate the counter an the info in the message parameter (recall that the info is supposed to take care of domain separation). If you need less than 513 keys you can have a one byte counter, making things real easy. It's not a perfect 1:1 mapping with HKDF-expand, but…
…Chacha20 is. And in many contexts you need to encrypt stuff, so you might as well use it to expand keys as well. It's fast, its API is simple, you can easily ad up to 24 bytes worth of info (for domain separation), and you even have the option to generate the output piecemeal (with the counter argument).
Raising the limit on the key size is a bad idea: it's not the place for the input key material, it gives the wrong idea about how much actual entropy we can shove in there (the real limit is 64 bytes), and it encourages extracting & expanding in one go (which requires stronger security assumptions from BLAKE2b). For similar reasons, raising the limit on the hash size is also a bad idea.

Sorry @fscoto, but I don't think we have a compelling use case for HMAC-BLAKE2b, nor HKDF-BLAKE2b. Too much code, too little actual gain, and I'm not sold on the standard ticking stuff either, even if we take Noise into account. The manual though definitely needs to show how to derive keys.

On the SHA512 side however things are different: HKDF is only a couple convenience functions away, and HKDF_SHA512 really is an established industry standard. So I'm thinking of adding crypto_sha512_hkdf(), crypto_sha512_hkdf_expand(), and maybe crypto_sha512_hkdf_expand_step(). Extraction is a single HMAC call, I don't think it deserves its own function.

Laczen commented 1 year ago

@LoupVaillant, I agree with your view on keeping your limited capacity to do the important things. Maybe it is possible to add key derivation method support in the following way: a. Define a standard key derivation method in monocypher as: crypto_hkdf_blake2_extract(prk, key, salt), crypto_hkdf_blake2_expand(out, prk, label, len) and a "convenience" function crypto_hkdf_blake2(out, label, salt, len). The prk could be a context. b. Accept other key derivation methods in the optional folder e.g. crypto_hkdf_sha512_extract(prk, key, salt), crypto_hkdf_sha512_expand(out, prk, label, len) and a "convenience" function .... Also a sha256 based method could be added. To avoid extra work from your side the optional methods should include their own test vectors and test methods so that their correct operation can be checked with one call to e.g. crypto_sha512_check().

samuel-lucas6 commented 1 year ago

We do not give HMAC input key material of dubious randomness and a counter, then expect it to sprout as many uniformly random independent outputs.

Yes, my reply was poorly phrased. What I meant was for the extract phase, BLAKE2b keyed with a uniformly random salt should be a good extractor like HMAC. What I wasn't as sure about is whether BLAKE2b without a salt as the key is considered sufficient as an extractor, but I'm guessing it is because hashing just the shared secret is prefix-MAC with a non-uniform key, so BLAKE2b is still keyed. I didn't read much of the paper; must do one day.

There's a reason (p22) why the BLAKE2b key is kept in its own block: it's common practice so we get a random IV before we hash the actual input.

Got it, thanks.

the input key has to be fully random unless we want to make additional assumptions about BLAKE2b.

The interesting thing is that Frank suggests doing BLAKE2b(message: shared secret || public key 1 || public key 2) to compute a shared key from X25519. I implemented this in my libsodium binding. This is also done in the newer key exchange API.

So, this is prefix-MAC and including the public keys apparently increases the key space.

I made the assumption only because Bernstein did the same with HSalsa20. As for his reasons, my guess is that it helps him implement crypto_box() with the least number of primitives: if there's no need for a proper hash the code is overall tighter.

Yeah sounds likely. Maybe also the efficiency.

Recall that in the case of HMAC, the key is used for the salt, not the input key material. The input key material is compressed in the message. So the proper place for your input key material is not the key but the message. This one has unlimited size, so you're good here.

The new API hides access to the message parameter.

SHA-256 is almost certainly faster on short inputs (less than 64 bytes, where a single compression is enough).

Right, I see. I've always heard SHA-512 is faster on 64-bit but never considered the input size, so I ought to look into this more.

Do nothing: we already have KDF.

Those points are fair. Are you going to recommend BLAKE2b(message: ikm) or BLAKE2b(key: HashLen zeros, message: ikm) like HKDF?

Extraction is a single HMAC call, I don't think it deserves its own function.

I would say add it to avoid confusion.

The prk could be a context. Also a sha256 based method could be added.

@Laczen I think Loup has made it pretty clear that SHA-256 will not be added. Not sure what you mean by prk could be a context. It should be the output of extract or a uniformly random key you already have.

I will try to reduce my replies now so this issue doesn't get too out of hand/sidetracked.

LoupVaillant commented 1 year ago

I'm feeling we're converging. Some disagreements will likely remain, but I'm the dictator here. :smiling_imp:

@Laczen I'll start working on (b) right away. At the very least, we need crypto_hkdf_sha512_expand() and crypto_hkdf_sha512(). The tests are no biggie, I already have a nice framework.

For (a), updating the manual with a nice example KDF will suffice. Of course some users won't read it, but I think heathens who don't read manuals have no business using crypto libraries. My responsibility stops at making a usable, safe, well documented API, and keeping it small helps immensely. If someones fails to read the manual and people die as a result, it's beyond my control.

Here's the example I mean to show:

void kdf(uint8_t *okm,  size_t okm_size,  // <= 256 * 64 bytes
         uint8_t *ikm,  size_t ikm_size,  // unlimited
         uint8_t *salt, size_t salt_size, // <= 64 bytes
         uint8_t *info, size_t info_size) // unlimited
{
    // Extract
    crypto_blake2b_config config = {
        .key       = salt,
        .key_size  = salt_size,
        .hash_size = 64,
    };
    uint8_t prk[64];
    crypto_blake2b(prk, config, ikm, ikm_size);

    // Expand
    uint8_t ctr = 0;
    while(okm_size > 0) {
        size_t size = MIN(okm_size, 64);

        // Expand step
        crypto_blake2b_config config = {
            .key       = prk,
            .key_size  = sizeof(prk),
            .hash_size = size,
        };
        crypto_blake2b_ctx ctx;
        crypto_blake2b_init  (&ctx, config);
        crypto_blake2b_update(&ctx, &ctr, 1);
        crypto_blake2b_update(&ctx, info, info_size);
        crypto_blake2b_final (&ctx, okm);

        okm      += size;
        okm_size -= size;
        ctr++;
    }
}

Simple, easy to copy & paste, and once I test it I'll vouch for it 100%.

In addition, I think I'll show another one with ChaCha20:

void kdf2(uint8_t *okm,  size_t okm_size,  // unlimited
          uint8_t *ikm,  size_t ikm_size,  // unlimited
          uint8_t *salt, size_t salt_size, // <= 64 bytes
          uint8_t info[8])
{
    // Extract
    crypto_blake2b_config config = {
        .key       = salt,
        .key_size  = salt_size,
        .hash_size = 32,
    };
    uint8_t prk[32];
    crypto_blake2b(prk, config, ikm, ikm_size);

    // Expand
    crypto_chacha20_djb(okm, NULL, okm_size, prk, info, 0);
}

One BLAKE2b call, one ChaCha20 call, I don't think we can get any simpler.

@samuel-lucas6

What I meant was for the extract phase, BLAKE2b keyed with a uniformly random salt should be a good extractor like HMAC.

It definitely is.

What I wasn't as sure about is whether BLAKE2b without a salt as the key is considered sufficient as an extractor,

It is. If HMAC works with a zero salt, BLAKE2b ought to work with a zero salt (source: my own leap of faith).

And if BLAKE2b works with a constant salt, it also works with no salt at all (that is, no key). Reason being, with a constant salt you effectively get a constant (and known) IV. With no salt, you also get a constant (and known) IV. The fact that only one of those IVs happens to be defined in the standard makes no difference.

The interesting thing is that Frank suggests doing BLAKE2b(message: shared secret || public key 1 || public key 2) to compute a shared key from X25519.

This is a valid use of BLAKE2b: put the input key material in the message, and having no salt is okay. I mean, using a salt is cute, it does increases theoretical security in some cases, but in practice it just complicates things.

So, this is prefix-MAC and including the public keys apparently increases the key space.

I think I disagree with Frank on this one. Public keys are typically known to the attacker, so they don't add any entropy. It doesn't matter how many public constants you hash, the only unknown here is the shared secret, for which there are no more than 2^252 possibilities.

Incidentally, saying that Curve25519 has less than 126 bits of security, while technically true, ignores the fact that performing a point addition is typically more costly than computing an AES block — both in theory and practice. And these are 126 bits of collision security, which out of the box are much better than 126 bits of preimage security.

I do agree with Frank that it does not matter.

Recall that in the case of HMAC, the key is used for the salt, not the input key material. The input key material is compressed in the message. So the proper place for your input key material is not the key but the message. This one has unlimited size, so you're good here.

The new API hides access to the message parameter.

Because as far as I know libsodium's crypto_kdf_derive_from_key() only implements KDF-expand (one step of it at least), and therefore can get by with a limited info parameter, which makes the implementation simpler in some ways, and a tiny bit faster as well.

Me, I was talking about KDF-compress, and that one is perfectly served by keyed (or raw) BLAKE2b. Libsodium doesn't need a new dedicated function for this.

Do nothing: we already have KDF.

Those points are fair. Are you going to recommend BLAKE2b(message: ikm) or BLAKE2b(key: HashLen zeros, message: ikm) like HKDF?

I'm recommending both of the following:

BLAKE2b(message: ikm)
BLAKE2b(message: ikm, key: random_salt) (where the salt is anywhere between 16 and 64 bytes).

With a preference for the first one because despite the theoretical advantages of the second one, the first is simpler and leaves less room for user error.

Extraction is a single HMAC call, I don't think it deserves its own function.

I would say add it to avoid confusion.

If it was only one alias I would seriously consider it. But it's actually four: one for the direct interface of HMAC, and 3 for the streaming interface. Also, aliases introduce their own kind of confusion, where users aren't quite sure whether the alias really behaves the same as the original function. One way or another we can't have it all.