cfrg / draft-irtf-cfrg-vdaf

VDAF specification
Other
20 stars 15 forks source link

Specify method for generating field elements (was "Proposal: use hash to field to generate field elements from a seed") #13

Closed armfazh closed 2 years ago

armfazh commented 2 years ago

TODO This functionality closely resembles what people usually think of as an extract-then-expand KDF, but differs somewhat in its syntax and, also, its required security properties. Can we get the same functionality from something that's more commonplace? HKDF doesn't fit the bill, unfortunately, because keys can only be expanded to a fairly short length. Our application requires a rather long key stream.

and

  • Field.rand_vec(len: Unsigned) -> output: Vec[Field] returns a vector of random field elements. The length of output MUST be len.

    NOTE In reality this would be achieved by generating a random

    key and expanding it into a sequence of field elements using a key derivation scheme. This should probably be made explicit.

In hash to curve draft, there is a method to derive an arbitrary number of field elements from a seed source. This is called a Expander, and there are two types: one based on Merkle-Damgard functions and the other based on eXtendable Output functions. See section 5.3 : https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hash-to-curve#section-5.3

cjpatton commented 2 years ago

I definitely like the idea of reusing a functionality from an existing draft.

cjpatton commented 2 years ago

Update based on discussion I had with @armfazh offline: We currently use rejection sampling for mapping bit strings to field elements. This has the advantage of inducing no bias, but it has the distinct disadvantage of making the runtime very slow for some fraction of the time. We hope that a constant-time algorithm will have better worst-case, or even average-case, performance. @armfazh is working on a performance evaluation.

hash-to-field is quite simple: just take a hash function H: \bits^* \to \bits^n, interpret the output as a 2^n-bit integer, and reduce the integer mod p. The question is how much bias this modular reduction induces.

Or more precisely: What's the statistical distance between a random variable chosen uniformly from [0, p) and a random variable chosen by sampling [0, 2^n) for some 2^n > p and reducing the sampled number mod p? Intuitively, this depends on how large is b = 2^n % p: the smaller the b, the fewer outputs there are due to the modular reduction. In fact, it's not hard to show that the statistical distance can be bounded, fairly tightly, by O(2^(log2(p) - n). This suggests that for n \approx 2*log(p) the statistical distance is about O(1/2^log(p)).

What does this mean for security? In our analysis of prio3 it'll be helpful to model the function that maps a seed to a sequence of field elements as a random oracle. Our hope would be that hash-to-field is indifferentiable from such an RO when the underlying hash function H is modeled as an RO. This ought to be the case for sufficiently large n.

We also observed that there are primes p for which there is an optimal n. For example, libprio uses p = 18446744069414584321 for its 64-bit prime. It turns out that 2^192 % p == 1, so picking n=192 would make the distributions very close indeed. Ideally we could find a 128-bit prime with this property, too.

cjpatton commented 2 years ago

@schoppmp made an excellent point, which we ought to take into account: hash_to_field appears to be designed for entropy extraction. (@armfazh can you confirm?). However for our application we don't need an entropy extractor because we're starting with a (pseuo)random seed. Thus using SHA-2 or SHAKE for instantiating hash_to_field would be overkill: AES in CTR-mode would suite our needs perfectly fine.

armfazh commented 2 years ago

Certainly, any SHA or SHAKE will be slower than AES (provided machine has hardware support for it). Hash to field just receives seeds as input, and is not used for key expansion.

cjpatton commented 2 years ago

Right, and given how much randomness we consume in our applications, I think it would be worthwhile providing a mapping a seed-to-field-vector expansion function that is based purely on AES. I'll put together a PR with a proposal.

cjpatton commented 2 years ago

Ok, before sending a PR, I want to get y'all's take on the over all shape of the solution. Here's my proposal:

  1. Add a parameter EXPANDED_SIZE to Field that specifies the number of random bytes that are sampled per field element.
  2. Adopt a modified version of hash_to_field in which we replace expand_message with the key stream output by AES-CTR (or some other, compatible primitive).

This could look something like this (quoting from https://github.com/cjpatton/vdaf/pull/24):

# The base class for PRGs.
class Prg:
    # Size of the seed.
    SEED_SIZE: int

    # Derive a fresh seed from an existing `seed`. The `info` input is used for
    # domain sepration.
    @classmethod
    def derive(cls, seed: bytes, info: bytes) -> bytes:
        raise Error("not implemented")

    # Expand the input `seed` into the number of bytes requested.
    @classmethod
    def expand(cls, seed: bytes, length: int) -> bytes:
        raise Error("not implemented")

    # Expand the input `seed` into vector of `length` field elements. This
    # algorithm is based on "hash_to_field" in draft-irtf-cfrg-hash-to-curve13.
    @classmethod
    def expand_into_vec(cls, Field, seed: bytes, length: int):
        L = Field.EXPANDED_SIZE
        len_in_bytes = length * L
        uniform_bytes = cls.expand(seed, len_in_bytes)

        vec = []
        for i in range(length):
            tv = uniform_bytes[L*i:L*(i+1)]
            u_i = OS2IP(tv) # Decode `tv` into an integer.
            vec.append(Field(u_i))
        return vec

@schoppmp what do you think of this proposal? Of course, we wouldn't literally plop down this Python into the spec :) A concrete PRG would implement derive and expand. For example:

# A pseudorandom generator based on AES128. CMAC {{!RFC4493}} is used for seed
# derivation and CTR mode is used for seed expansion.
class PrgAes128(Prg):
    # Associated parameters
    SEED_SIZE = 16

    @classmethod
    def derive(cls, seed, info) -> bytes:
        hasher = CMAC.new(seed, ciphermod=AES)
        return hasher.update(info).digest()

    @classmethod
    def expand(cls, seed, length):
        counter = Counter.new(128, initial_value=bytes_to_long(zeros(16)))
        cipher = AES.new(seed, AES.MODE_CTR, counter=counter)
        # CTR-mode encryption of the all-zero string of the specified length and using
        # the all-zero block as the IV.
        cipher_stream = cipher.encrypt(zeros(length))
        return cipher_stream

CMAC is a somewhat unconventional choice. A more conventional choice might be HMAC-SHA256, however in that case we might want to increase SEED_SIZE to 32.

cjpatton commented 2 years ago

PR #25 addresses the issue raised here. I ended up going in a slightly different direction with the API:

class Prg:
    # Size of the seed.
    SEED_SIZE: int

    # Number of bytes sampled per pseudorandom field element.
    EXPANDED_SIZE: int

    # Expand the input `seed` into the number of bytes requested.
    @classmethod
    def expand(cls, seed: bytes, info: bytes, length: int) -> bytes:
        raise Error("not implemented")

    # Derive a fresh seed from an existing one.
    @classmethod
    def derive(cls, seed: bytes, info: bytes) -> bytes:
        return cls.expand(seed, info, cls.SEED_SIZE)

    # Expand the input `seed` into vector of `length` field elements. This
    # algorithm is based on "hash_to_field" in draft-irtf-cfrg-hash-to-curve13.
    @classmethod
    def expand_into_vec(cls, Field, seed: bytes, info: bytes, length: int):
        L = Field.EXPANDED_SIZE
        len_in_bytes = length * L
        uniform_bytes = cls.expand(seed, info, len_in_bytes)

        vec = []
        for i in range(0, len(uniform_bytes), L):
            tv = uniform_bytes[i:i+L]
            x = OS2IP(tv)
            vec.append(Field(x))
        return vec
chris-wood commented 2 years ago

@schoppmp made an excellent point, which we ought to take into account: hash_to_field appears to be designed for entropy extraction.

FWIW, I don't think this is correct.

schoppmp commented 2 years ago

@cjpatton I don't think this approach works directly as you describe it. The reason is that if the field modulus p is small-ish (e.g., 32 bits), then just taking 32 uniform bits and reducing them mod p will result in something far from uniform. This is what hash_to_curve fixes by increasing L by the security parameter k.

My point was twofold:

  1. If we want to sample n prime field elements from a single seed, we don't need log(p) + k uniform bits for every element we sample, but something more on the order of log(p) n + k + log(n) for all* n samples. (Edit: Fixed the formula)
  2. If our seed is uniform, then using AES (either in counter mode or as a hash function) might be cheaper than SHA(KE).

@chris-wood Could you elaborate with which part you disagree?

chris-wood commented 2 years ago

@chris-wood Could you elaborate with which part you disagree?

With one exception, I don't disagree with anything you wrote in that comment. I was merely responding to the "entropy extraction" point, which is not an explicit goal of hash-to-field.

If our seed is uniform, then using AES (either in counter mode or as a hash function) might be cheaper than SHA(KE).

I may be misunderstanding you, but this seems like a non sequitur since expand_message will work for any input, regardless of its distribution. In any case, I think it's reasonable to require the seed to be uniformly distributed. And using a stream cipher instead of hash-based expand_message certainly seems like an improvement. (In fact, we should consider adding such variant to the hash-to-curve draft!)

cjpatton commented 2 years ago

@schoppmp 1. If we want to sample n prime field elements from a single seed, we don't need log(p) + k uniform bits for every element we sample, but something more on the order of log(p) + n * k + log(n) for all n samples.

Do you mean (log(p) + n) * k + log(n) or log(p) + (n * k) + log(n)?

schoppmp commented 2 years ago

I may be misunderstanding you, but this seems like a non sequitur since expand_message will work for any input, regardless of its distribution.

If we want pseudorandom outputs, then the seed needs to be be pseudorandom as well, or, alternatively, have high entropy so we can use a hash function (e.g. SHA) to reduce it to a uniform pseudorandom seed. I think the second part is what @cjpatton meant by entropy extraction.

Do you mean (log(p) + n) k + log(n) or log(p) + (n k) + log(n)?

I actually meant (log(p) * n) + k + log(n) :smile: . I'll fix the comment above.

chris-wood commented 2 years ago

@schoppmp right -- it doesn't extract entropy. Garbage in means garbage out. =) In any case, I think we're aligned now.

cjpatton commented 2 years ago

@schoppmp Very interesting. Would you mind writing down the algorithm for mapping a (log(p) * n) + k + log(n) bits to n field elements? I'm curious how the extra k + log(n) bits are used. (You can also refer me to the IDPF code base.)

schoppmp commented 2 years ago

Sure, you can have a look here. Note that the current implementation only works on 128-bit blocks, which limits the maximum k supported.

cjpatton commented 2 years ago

Alright, this is super neat. Me and @armfazh are going to work on benchmarking (in libprio) rejection sampling vs pad-then-reduce (a la hash_to_field) vs @schoppmp's algorithm. We should have this done today.

In the meantime, @schoppmp: Would you mind working on an alternative PR to #25 that specifies your algorithm?

EDIT: By the way, I think it would make sense to fix the same "security parameter" (say, 64) for all fields, and possibly exclude fields that are too small for this sampling method to be statistically close to uniform (e.g., less than 64 bits). See also #22.

cjpatton commented 2 years ago

I've implemented both approaches PRs (linked to this issue).

armfazh commented 2 years ago

A comparison of three methods can be found at: https://github.com/abetterinternet/libprio-rs/compare/main...armfazh:prg/bench?expand=1

Method N=10 N=100 N=1000
Rejection Sampling 5.2 7.8 38.1
Pad-and-reduce 4.8 15.1 115.8
Borrow-and-reduce 4.5 12.7 83.7

* Timings are µs measured in Core i7-8650U CPU @ 1.90GHz.

cjpatton commented 2 years ago

Thanks @armfazh. Based on this assessment, I think that we should specify rejection sampling. @schoppmp do you object?

schoppmp commented 2 years ago

An issue with rejection sampling is that it's hard to do with a constant-time implementation, which was also observed in the hash-to-curve draft. Is your implementation returning early after the required number of finite field elements have been sampled? Or is it always sampling enough times to ensure the probability of rejecting too many samples is small enough (wrt. the security parameter)?

cjpatton commented 2 years ago

libprio-rs currently generates samples until it reaches the number of field elements requested. This means the runtime is not bounded (in particular it's definitely not constant time!): https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82

Incidentally, @armfazh noted that the current IDPF implementation is also not constant time, due to the division by the modulus here: https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82. Though this should be easy enough to fix.

Taking a step back, do we actually care if the algorithm that computes the mapping from pseudorandom byte strings to field elements is constant time? Take rejection sampling, for instance: assuming the underlying PRG is constant-time (i.e., AES-CTR is implemented in constant time) It seems to me that all that can be leaked by a non-constant-time implementation is the value of the bits that are rejected.

schoppmp commented 2 years ago

Incidentally, @armfazh noted that the current IDPF implementation is also not constant time, due to the division by the modulus here: https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82. Though this should be easy enough to fix.

Not sure what you mean, the link goes to libprio-rs. AFAICT all branches in the DPF sampling function are based on loop variables or compile-time constants.

Taking a step back, do we actually care if the algorithm that computes the mapping from pseudorandom byte strings to field elements is constant time? Take rejection sampling, for instance: assuming the underlying PRG is constant-time (i.e., AES-CTR is implemented in constant time) It seems to me that all that can be leaked by a non-constant-time implementation is the value of the bits that are rejected.

Maybe you are right and that leakage is fine (in that it can be simulated in a proof). @chris-wood, do you have an insight as to why constant-time is required for hash-to-curve and thus rejection sampling is ruled out?

chris-wood commented 2 years ago

Maybe you are right and that leakage is fine (in that it can be simulated in a proof). @chris-wood, do you have an insight as to why constant-time is required for hash-to-curve and thus rejection sampling is ruled out?

Yeah -- some applications of hash-to-curve operate on secret input, and thus any side channels may leak the secret. Dragonblood is one famous example of this that motivated the hash-to-curve standard.

cjpatton commented 2 years ago

Ugh, sorry I copied the wrong link. I'm on a Mac and feel wayyy out of my depth :) Here's the link: https://github.com/google/distributed_point_functions/blob/master/dpf/int_mod_n.h#L170. I'm not sure this is guaranteed to be constant-time, i.e., the runtime of the division operation might depend on the value of r. I think this depends on the CPU arch.

schoppmp commented 2 years ago

@cjpatton Ah got it. Yeah, I was assuming arithmetic operations are constant-time, but they might not be. In that case the fix should still be easy as you pointed out.

@chris-wood So it seems that the number of iterations needed for many hashes of the same password + different auxiliary information (MAC, counter, ...) can be used to find the password. On the other hand, revealing even full blocks of an AES-CTR output shouldn't reveal anything about the key or any other blocks in the output. So is the issue in Dragonblood that the password is low-entropy?

cjpatton commented 2 years ago

Seems like we may have consensus for rejection sampling. Here's a PR: https://github.com/cjpatton/vdaf/pull/31. @schoppmp if you're happy with this outcome then please review at your convenience :)

schoppmp commented 2 years ago

SGTM, provided we emphasize that the seed MUST be high-entropy, i.e., not derived from a password using a KDF or similar. I'll take a look at #31.

cjpatton commented 2 years ago

Actually I think high entropy (or, to be more precise, high min-entropy) won't be sufficient for all PRG constructions. See comments on the PR.