Implement Proper Zeroization for Sensitive and Secret Data

Note: This Epic description is WIP. I may add more information later. But in general, I cannot write here, exhaustively, everything that should be said about zeroizing.

This library generates and uses critical, secret values. We need to carefully and thoughtfully implement zeroization for this data. There are several challenges to making this happen:

Which data is sensitive and should be zeroized?

This is not clearly labeled in the code. We do know of some structs that contain secret values, e.g. the Private structs (There are three of these, one for each round of the sub-protocol). Even then, not every field in this struct requires zeroization.

The round one struct currently looks like:

/// Private data used in round one of the presign protocol.
#[derive(ZeroizeOnDrop)]
pub(crate) struct Private {
    pub k: BigNumber,
    pub rho: Nonce,
    pub gamma: BigNumber,
    pub nu: Nonce,
    #[zeroize(skip)]
    pub G: Ciphertext, // Technically can be public but is only one per party
    #[zeroize(skip)]
    pub K: Ciphertext, // Technically can be public but is only one per party
}

Note: One way to identify these structs is by their ZeroizeOnDrop derive line. If something has ZeroizeOnDrop there is a good chance some of the data in there needs to be zeroized.

Our current approach is to replace the former struct with a new version something like:

pub(crate) struct Private {
    pub k: SecretBigNumber,
    pub rho: SecretNonce,
    pub gamma: BigNumber,
    pub nu: SecretNonce,
    pub G: Ciphertext, // Technically can be public but is only one per party
    pub K: Ciphertext, // Technically can be public but is only one per party
}

We can notice several changes here:

Use `Secret*` variants of the existing types we work with. These types have several advantages:

I believe the secret data should be tagged at the type level.
These secret variants allow us to centralize the implementation and have a single type representing the secret data.
These Secret* variants are wrapper types around the underlying type. For example BigNumber and SecretBigNumber. This allows us to lock down the API and ensure methods do not leak or copy the type, e.g. clone.
These wrappers ensure the type is heap-allocated. This helps prevent issues with stack copies being made during certain operations.

We remove the `ZeroizeOnDrop` derivation.

Having ZeroizeOnDrop in these "compound" structs makes it so that the individual fields cannot be "moved out" (in the Rust language sense). Types cannot have fields moved out if they implement a destructor.

This makes it hard to "reuse" or "move" this data as it flows through the code. You end up having to make copies. To avoid this, and leverage Rust's powerful move semantics, we do not implement ZeroizeOnDrop directly. Instead, the sub-types (the type of the fields that make up our compound types) should implement ZeroizeOnDrop. This will allow the fields to be moved into other structs or functions.

Example implementation:

Here is what SecretBigNumber would look like:

#[derive(Clone, ZeroizeOnDrop)]
pub(crate) struct SecretBigNumber(Box<BigNumber>);

This type, SecretBigNumber should encapsulate (via methods) all the functionality and operations necessary for it to accomplish it's work. This type should avoid exposing the underlying data to the user. It should also be careful to to accidentally make copies that are not then zeroized.

When it comes to zeroizing, our solutions is:

50% leveraging the programming language: Using the zeroize crate, encapsulating data via OO, and working with the Rust move and copy semantics, heap allocating, etc.
50% best practices for developers should follow: encapsulating data via the type's API and only exposing necessary methods.

On 100% Zeroizing

Even with this work, I do not think we can reach 100% zeroization. There are some places where our secret data may still find itself: registers, CPU-caches, etc. There is not a good way for us to handle this changes and not really necessary.

Data will also momentarily live on the stack until we get it in the heap. Consider this method which creates a new SecretBigNumber:

fn random_positive_bn<R: RngCore + CryptoRng>(
        rng: &mut R,
        n: &BigNumber,
    ) -> SecretBigNumber {
        // Note: We don't zeroize the stack allocated value here...
        let bn = BigNumber::from_rng(n, rng);
        SecretBigNumber(Box::new(bn))
    }

Before boxing (heap-allocating) this value, it momentarily lives on the stack. Once we have heap allocated it, it can not be zeroized since it has already been "moved" into the Box constructor. So this value remains on the stack, ideally, another function call would quickly clobber that stack memory, effectively deleting the value. This is the best we can do since Rust does not support allocating values directly in the heap. Check out the unstable box syntax for more information.

More Background

See tickets:

330
331
https://github.com/boltlabs-inc/key-mgmt/issues/512 For more information.

boltlabs-inc / tss-ecdsa