Argyle-Software / kyber

A rust implementation of the Kyber post-quantum KEM
https://docs.rs/pqc_kyber/
Apache License 2.0
181 stars 41 forks source link

Don't store shared secret on stack, and allow it to be zeroized #59

Open faern opened 1 year ago

faern commented 1 year ago

The current signature of both encapsulate and decapsulate (fn encapsulate(...) -> Result<... [u8; 32] ...>, fn decapsulate(...) -> Result<[u8; 32], ...> stores the secret on the stack. This means it will potentially be copied around many times in memory, depending on how optimizations happen to generate the final code.

It is recommended to zero out cryptographic secrets when done with them/before other software has a chance to read that memory. This is currently not really possible with pqc_kyber and the current function signatures.

I recommend, and would be very interested in improving this. I implemented something very similar for classic-mceliece-rust recently. All secrets were bundled up in dedicated containers that allowed both borrowed and owned versions of the data, but that was never copied around in the memory, and that automatically cleared themselves on drop. See here for one example: https://github.com/Colfenor/classic-mceliece-rust/blob/bb57cd4f97ae2d092b8b1bfbc88c49507eca2892/src/lib.rs#L283-L333

The same goes for the SecretKey type really. It should implement Zeroize + ZeroizeOnDrop.

k3d3 commented 1 year ago

Does that work at all without alloc?

faern commented 1 year ago

Yes. The Classic McEliece library has an alloc feature that can be turned off, and zerioze still works. It's just that the caller must provide a mutable reference to the output buffer since the heap can't be used and the secret data must not be returned on the stack

mberry commented 1 year ago

Posted a draft implementation as a basic starting point, from what I can see it clears everything important on my x86 system but copy elision behaviour isn't guaranteed on various platforms and even different compilers. The draft mainly focuses on internal transient secrets, especially the secret key polynomial from which the secret key is easily derived.

It's not intended to be merged but just to put it out there for the internals.

Looking at the Classic McEliece code, keen for the same design of passing a buffer, but prefer to keep it on the stack as default behaviour. Happy to provide a separate boxed version too, there's likely a fair bit of overhead with heap allocation.

That McEliece code you linked doesn't use Pin? That can still end up copied in various ways. The boxed version looks fine though, from a very brief check they don't handle zeroisation of intermediate secrets from which a key can be derived it seems.

Apologies for the delay on this Linus, there's a long overdue API redesign tied into all these changes and would prefer to do that all at once to cause minimal disruption. For instance: Keypair shouldn't have any public fields, the public key field itself is redundant and can be derived at runtime with no cost. The Kex structs also have send fields which should be merged into an enum for space savings, etc. Alas, such stuff deserves it's own issue.

mberry commented 1 year ago

Zeroisation draft: https://github.com/Argyle-Software/kyber/pull/67

mberry commented 1 year ago

Have been prototyping a few different designs for this, realise the issue is about sharedsecret but am starting with Keypair and working down from that first. Will be essentially the same internals with a SharedSecret struct.

Tending towards something like this for now:

use std::pin::Pin;
use zeroize::Zeroize;

#[derive(Default)]
struct SecretBuffer([u8; SECRETKEYBYTES]);

struct Keypair<'s> {
  secret: Pin<&'s SecretBuffer>,
}

impl<'s> Keypair<'s> {
  fn new(secret_buf: &'s mut SecretBuffer, rng: _) -> Self {
    // Split buffer into public and secret keys
    let (mut secret, mut public) = secret_buf.0.split_at_mut(SECRETKEYBYTES);
    crypto_kem_keypair(public, secret, rng); 
    Keypair {
      secret: Pin::new(secret_buf),
    }
  }

  fn expose_secret(&self) -> Pin<&'s SecretBuffer> {
    self.secret
  }

  fn public_key(&self) -> [u8; PUBLICKEYBYTES] {
    let mut out = [0u8; PUBLICKEYBYTES];
    out.copy_from_slice(&self.secret.0[SECRETKEYBYTES..][..PUBLICKEYBYTES]);
    out
  }
}

impl Zeroize for SecretBuffer {
  fn zeroize(&mut self) {
    self.0.zeroize()
  }
}

impl Drop for SecretBuffer {
  fn drop(&mut self) {
    self.0.zeroize();
  }
}

So this code is all feature-gated with zeroize and someone using it will instantiate a SecretBuffer locally to use in Keypair generation rather than use the original function which only requires a RNG.

Pin ensures the data can't be copied anywhere. A heap allocated alternative function would look the same but with Box instead.

Edit: For posterity the full secretkey data structure goes: SECRETKEY | PUBLICKEY | H(PUBLICKEY) | Z

mberry commented 1 year ago

Note that this doesn't currently handle all avx2 codepaths. It doesn't particularly matter for secret key generation but may require clearing registers for the shared secret.

AVX2 registers can be cleared with _mm256_zeroall, best practice for clearing GPR's is to XOR them with themselves.

faern commented 1 year ago

Thanks for tending to this issue. Sorry for not replying in a few days. I'll try to catch up and help where I can!

from what I can see it clears everything important on my x86 system but copy elision behaviour isn't guaranteed on various platforms and even different compilers

I don't think it should be the responsibility of the Kyber project to verify these things. If we just use the zeroize crate as intended it's their responsibility to make sure it works.

faern commented 1 year ago

there's likely a fair bit of overhead with heap allocation.

I don't think it's healthy to look at heap allocations as inherently evil. Sure, they can have measurable performance impact in certain scenarios. Maybe it could have a tiny dent on performance for some server that servers millions of kyber KEM requests a second, but in all other scenarios it's not relevant at all IMO.

For example in our use case we use kyber once to derive a shared secret for a VPN tunnel and then use the same shared secret for the lifetime of that tunnel. If you put a single allocation next to the code that needs to run to establish a full tunnel the allocation is definitely dwarfed by many many many magnitudes.

Security on the other hand is of critical importance here. If I can pay a few allocations for the benefit of making it harder to steal my secrets, I'll gladly do it many times over.

EDIT: I of course think this crate should expose an API that is usable without heap allocation also. I'm just saying that I will personally use the securest and most ergonomic API and will not care about performance a lot (I care about performance, but this thing here makes no difference for me).

That McEliece code you linked doesn't use Pin? That can still end up copied in various ways

Do you have an example of how it could end up being copied? The only thing that is passed around are slices or array references, so pointers. Unless the compiler does some optimization that de-references the array even without the code doing it, I'm not sure how it could be copied around.

mberry commented 1 year ago

Don't see it as a bad thing and I doubt the overhead is much, but there's likely some overlap between those who can't use an allocator and also want to zeroise out secrets, so yeah having both Pin and Box versions seems to be the right move.

Pin is restrictive but for no_std it's the only real option, most should probably use a boxed version anyway, which is essentially what the Secrecy crate does.

Do you have an example

It's a tad contrived I guess, but here's an example that hashes the secret key and ends up with a copy in memory:

use rand::rngs::OsRng;
use std::hash::{Hash, Hasher};
use std::collections::hash_map::DefaultHasher;
use classic_mceliece_rust::*;

/// Take secret key and hash it
/// Return the pointer to foo
fn do_something(foo: [u8; 4]) -> *const u8 {
  let mut hasher = DefaultHasher::new();
  foo.hash(&mut hasher);
  println!("Hashing the secret key for fun: {:?}", hasher.finish());
  foo.as_ptr()
}

fn main() {
  let mut public_key_buf = [0u8; CRYPTO_PUBLICKEYBYTES];
  let mut secret_key_buf = [0u8; CRYPTO_SECRETKEYBYTES];

  let (_, sk) = keypair(&mut public_key_buf, &mut secret_key_buf, &mut OsRng);

  let sk_ptr = sk.as_array().as_ptr();
  println!("Secret Key: {:?}", unsafe { std::slice::from_raw_parts(sk_ptr as *const u8, 4) });

  // Do something with the secret key
  let sk_copy_ptr = do_something(sk.as_array()[..4].try_into().unwrap());
  println!("foo in do_something(): {:?}", unsafe { std::slice::from_raw_parts(sk_copy_ptr as *const u8, 4) });

  println!("Dropping Secret Key");
  drop(sk);

  println!("Secret Key: {:?}", unsafe { std::slice::from_raw_parts(sk_ptr as *const u8, 4) });
  println!("foo in do_something(): {:?}", unsafe { std::slice::from_raw_parts(sk_copy_ptr as *const u8, 4) });
}

Output:

Secret Key: [160, 148, 199, 130]
Hashing the secret key for fun: 18177111044196897214
foo in do_something(): [160, 148, 199, 130]
Dropping Secret Key
Secret Key: [0, 0, 0, 0]
foo in do_something(): [160, 148, 199, 130]

With a pinned buffer inside the secret key this wouldn't be possible. Alternately had foo been passed by reference it would be zeroed out.

Edit: To be fair though they seem to explicitly mention not moving the secret key in the readme section that I had passed over.

faern commented 1 year ago

Sorry, I don't think I understand how your example show there is a flaw with passing in mutable array references as secrets buffers.

You are passing foo as an array by value. So you explicitly copy the secret when you call do_something. That is regardless of hashing or whatever happens inside that function. When I talk about making it possible to use the library without spilling secrets in memory I'm referring to making it possible to avoid spilling the key, not making it impossible to copy the key to new memory locations.

In any KEM implementation the user must be able to get the key out. Because they need to use it for something, right? Having access to the shared secret and doing something with it is the goal after all. And if the user gets access to the data they can copy it to wherever they want, including broadcasting it over the internet. A library can't prevent that. But it should at least make it possible for a developer to use the library and have the secrets not spill to memory locations that are not zeroed. No matter how you hide the key behind pins, if the library allows the user to get out a &[u8; ..] or even just a *const u8 to the secret, the user can copy it, and that's correct.

faern commented 1 year ago
sk.as_array()[..4].try_into().unwrap()

This code copies the secret key to a new location in memory, but in a pretty obfuscated way. As pointed out above, as long as you give the user any access to the secret key material, they will always have the possibility to copy it. And I think it's a non-goal to try and prevent that. It's a goal to make the library not copy secrets around in memory against the users will.

mberry commented 1 year ago

as long as you give the user any access to the secret key material, they will always have the possibility to copy it

Certainly, but my issue is that it's not explicit what is happening behind the scenes and opens up the potential for misuse due to API design.

Personally not a fan of saying "you used it wrong" as a response to end-users discovering their keys written all over the place, it should be fail-safe by default.

not making it impossible to copy the key to new memory locations

Of course not, but it should be completely explicit when it happens and the copy itself should always be zeroized too.

faern commented 1 year ago

and the copy itself should always be zeroized too.

How can you enforce that? You will need to give library users &[u8; ...] access to the derived shared secret, because you don't know how they will use it, so they need access to the data. And then there is nothing stopping users from copying it into structs that don't zeroize.

Do you have a concrete suggestion how the API could look to achieve what you mean?

brxken128 commented 1 year ago

It's still partly up to the user to follow good cryptographic hygiene, it's impossible to expect a library to make sure none of their externally-handled secrets leak. The user of this crate should also be using zeroize, and possibly even newtypes that derive Zeroize and ZeroizeOnDrop (or even impl it manually). The Zeroizing<> wrapper is also great for applications where you need something zeroed once it goes out of scope, but don't need an entire type for it.

I wouldn't go use the chacha20poly1305 crate and complain when I allowed my primary encryption key (vec![u8; 32]1) to be copied to 12 different places in memory, or allow for it to be included in a backtrace as it implements Debug.

1: The vec![] was an extreme example of misuse, but my point is that the crate can only do so much to stop the user from shooting themselves in the foot.

bingmatv commented 8 months ago

the secret data must not be returned on the stack

Does zeroize work for stack?