Closed armfazh closed 2 years ago
I definitely like the idea of reusing a functionality from an existing draft.
Update based on discussion I had with @armfazh offline: We currently use rejection sampling for mapping bit strings to field elements. This has the advantage of inducing no bias, but it has the distinct disadvantage of making the runtime very slow for some fraction of the time. We hope that a constant-time algorithm will have better worst-case, or even average-case, performance. @armfazh is working on a performance evaluation.
hash-to-field is quite simple: just take a hash function H: \bits^* \to \bits^n
, interpret the output as a 2^n
-bit integer, and reduce the integer mod p
. The question is how much bias this modular reduction induces.
Or more precisely: What's the statistical distance between a random variable chosen uniformly from [0, p)
and a random variable chosen by sampling [0, 2^n)
for some 2^n > p
and reducing the sampled number mod p
? Intuitively, this depends on how large is b = 2^n % p
: the smaller the b
, the fewer outputs there are due to the modular reduction. In fact, it's not hard to show that the statistical distance can be bounded, fairly tightly, by O(2^(log2(p) - n)
. This suggests that for n \approx 2*log(p)
the statistical distance is about O(1/2^log(p))
.
What does this mean for security? In our analysis of prio3 it'll be helpful to model the function that maps a seed to a sequence of field elements as a random oracle. Our hope would be that hash-to-field is indifferentiable from such an RO when the underlying hash function H
is modeled as an RO. This ought to be the case for sufficiently large n
.
We also observed that there are primes p
for which there is an optimal n
. For example, libprio uses p = 18446744069414584321
for its 64-bit prime. It turns out that 2^192 % p == 1
, so picking n=192
would make the distributions very close indeed. Ideally we could find a 128-bit prime with this property, too.
@schoppmp made an excellent point, which we ought to take into account: hash_to_field appears to be designed for entropy extraction. (@armfazh can you confirm?). However for our application we don't need an entropy extractor because we're starting with a (pseuo)random seed. Thus using SHA-2 or SHAKE for instantiating hash_to_field would be overkill: AES in CTR-mode would suite our needs perfectly fine.
Certainly, any SHA or SHAKE will be slower than AES (provided machine has hardware support for it). Hash to field just receives seeds as input, and is not used for key expansion.
Right, and given how much randomness we consume in our applications, I think it would be worthwhile providing a mapping a seed-to-field-vector expansion function that is based purely on AES. I'll put together a PR with a proposal.
Ok, before sending a PR, I want to get y'all's take on the over all shape of the solution. Here's my proposal:
EXPANDED_SIZE
to Field
that specifies the number of random bytes that are sampled per field element.hash_to_field
in which we replace expand_message
with the key stream output by AES-CTR (or some other, compatible primitive).This could look something like this (quoting from https://github.com/cjpatton/vdaf/pull/24):
# The base class for PRGs.
class Prg:
# Size of the seed.
SEED_SIZE: int
# Derive a fresh seed from an existing `seed`. The `info` input is used for
# domain sepration.
@classmethod
def derive(cls, seed: bytes, info: bytes) -> bytes:
raise Error("not implemented")
# Expand the input `seed` into the number of bytes requested.
@classmethod
def expand(cls, seed: bytes, length: int) -> bytes:
raise Error("not implemented")
# Expand the input `seed` into vector of `length` field elements. This
# algorithm is based on "hash_to_field" in draft-irtf-cfrg-hash-to-curve13.
@classmethod
def expand_into_vec(cls, Field, seed: bytes, length: int):
L = Field.EXPANDED_SIZE
len_in_bytes = length * L
uniform_bytes = cls.expand(seed, len_in_bytes)
vec = []
for i in range(length):
tv = uniform_bytes[L*i:L*(i+1)]
u_i = OS2IP(tv) # Decode `tv` into an integer.
vec.append(Field(u_i))
return vec
@schoppmp what do you think of this proposal? Of course, we wouldn't literally plop down this Python into the spec :) A concrete PRG would implement derive
and expand
. For example:
# A pseudorandom generator based on AES128. CMAC {{!RFC4493}} is used for seed
# derivation and CTR mode is used for seed expansion.
class PrgAes128(Prg):
# Associated parameters
SEED_SIZE = 16
@classmethod
def derive(cls, seed, info) -> bytes:
hasher = CMAC.new(seed, ciphermod=AES)
return hasher.update(info).digest()
@classmethod
def expand(cls, seed, length):
counter = Counter.new(128, initial_value=bytes_to_long(zeros(16)))
cipher = AES.new(seed, AES.MODE_CTR, counter=counter)
# CTR-mode encryption of the all-zero string of the specified length and using
# the all-zero block as the IV.
cipher_stream = cipher.encrypt(zeros(length))
return cipher_stream
CMAC is a somewhat unconventional choice. A more conventional choice might be HMAC-SHA256, however in that case we might want to increase SEED_SIZE
to 32
.
PR #25 addresses the issue raised here. I ended up going in a slightly different direction with the API:
class Prg:
# Size of the seed.
SEED_SIZE: int
# Number of bytes sampled per pseudorandom field element.
EXPANDED_SIZE: int
# Expand the input `seed` into the number of bytes requested.
@classmethod
def expand(cls, seed: bytes, info: bytes, length: int) -> bytes:
raise Error("not implemented")
# Derive a fresh seed from an existing one.
@classmethod
def derive(cls, seed: bytes, info: bytes) -> bytes:
return cls.expand(seed, info, cls.SEED_SIZE)
# Expand the input `seed` into vector of `length` field elements. This
# algorithm is based on "hash_to_field" in draft-irtf-cfrg-hash-to-curve13.
@classmethod
def expand_into_vec(cls, Field, seed: bytes, info: bytes, length: int):
L = Field.EXPANDED_SIZE
len_in_bytes = length * L
uniform_bytes = cls.expand(seed, info, len_in_bytes)
vec = []
for i in range(0, len(uniform_bytes), L):
tv = uniform_bytes[i:i+L]
x = OS2IP(tv)
vec.append(Field(x))
return vec
@schoppmp made an excellent point, which we ought to take into account: hash_to_field appears to be designed for entropy extraction.
FWIW, I don't think this is correct.
@cjpatton I don't think this approach works directly as you describe it. The reason is that if the field modulus p is small-ish (e.g., 32 bits), then just taking 32 uniform bits and reducing them mod p will result in something far from uniform. This is what hash_to_curve fixes by increasing L by the security parameter k.
My point was twofold:
@chris-wood Could you elaborate with which part you disagree?
@chris-wood Could you elaborate with which part you disagree?
With one exception, I don't disagree with anything you wrote in that comment. I was merely responding to the "entropy extraction" point, which is not an explicit goal of hash-to-field.
If our seed is uniform, then using AES (either in counter mode or as a hash function) might be cheaper than SHA(KE).
I may be misunderstanding you, but this seems like a non sequitur since expand_message
will work for any input, regardless of its distribution. In any case, I think it's reasonable to require the seed to be uniformly distributed. And using a stream cipher instead of hash-based expand_message
certainly seems like an improvement. (In fact, we should consider adding such variant to the hash-to-curve draft!)
@schoppmp 1. If we want to sample n prime field elements from a single seed, we don't need log(p) + k uniform bits for every element we sample, but something more on the order of log(p) + n * k + log(n) for all n samples.
Do you mean (log(p) + n) * k + log(n)
or log(p) + (n * k) + log(n)
?
I may be misunderstanding you, but this seems like a non sequitur since expand_message will work for any input, regardless of its distribution.
If we want pseudorandom outputs, then the seed needs to be be pseudorandom as well, or, alternatively, have high entropy so we can use a hash function (e.g. SHA) to reduce it to a uniform pseudorandom seed. I think the second part is what @cjpatton meant by entropy extraction.
Do you mean (log(p) + n) k + log(n) or log(p) + (n k) + log(n)?
I actually meant (log(p) * n) + k + log(n)
:smile: . I'll fix the comment above.
@schoppmp right -- it doesn't extract entropy. Garbage in means garbage out. =) In any case, I think we're aligned now.
@schoppmp Very interesting. Would you mind writing down the algorithm for mapping a (log(p) * n) + k + log(n)
bits to n
field elements? I'm curious how the extra k + log(n)
bits are used. (You can also refer me to the IDPF code base.)
Sure, you can have a look here. Note that the current implementation only works on 128-bit blocks, which limits the maximum k
supported.
Alright, this is super neat. Me and @armfazh are going to work on benchmarking (in libprio) rejection sampling vs pad-then-reduce (a la hash_to_field) vs @schoppmp's algorithm. We should have this done today.
In the meantime, @schoppmp: Would you mind working on an alternative PR to #25 that specifies your algorithm?
EDIT: By the way, I think it would make sense to fix the same "security parameter" (say, 64) for all fields, and possibly exclude fields that are too small for this sampling method to be statistically close to uniform (e.g., less than 64 bits). See also #22.
I've implemented both approaches PRs (linked to this issue).
A comparison of three methods can be found at: https://github.com/abetterinternet/libprio-rs/compare/main...armfazh:prg/bench?expand=1
Method | N=10 | N=100 | N=1000 |
---|---|---|---|
Rejection Sampling | 5.2 | 7.8 | 38.1 |
Pad-and-reduce | 4.8 | 15.1 | 115.8 |
Borrow-and-reduce | 4.5 | 12.7 | 83.7 |
* Timings are µs measured in Core i7-8650U CPU @ 1.90GHz.
Thanks @armfazh. Based on this assessment, I think that we should specify rejection sampling. @schoppmp do you object?
An issue with rejection sampling is that it's hard to do with a constant-time implementation, which was also observed in the hash-to-curve draft. Is your implementation returning early after the required number of finite field elements have been sampled? Or is it always sampling enough times to ensure the probability of rejecting too many samples is small enough (wrt. the security parameter)?
libprio-rs currently generates samples until it reaches the number of field elements requested. This means the runtime is not bounded (in particular it's definitely not constant time!): https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82
Incidentally, @armfazh noted that the current IDPF implementation is also not constant time, due to the division by the modulus here: https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82. Though this should be easy enough to fix.
Taking a step back, do we actually care if the algorithm that computes the mapping from pseudorandom byte strings to field elements is constant time? Take rejection sampling, for instance: assuming the underlying PRG is constant-time (i.e., AES-CTR is implemented in constant time) It seems to me that all that can be leaked by a non-constant-time implementation is the value of the bits that are rejected.
Incidentally, @armfazh noted that the current IDPF implementation is also not constant time, due to the division by the modulus here: https://github.com/abetterinternet/libprio-rs/blob/main/src/prng.rs#L56-L82. Though this should be easy enough to fix.
Not sure what you mean, the link goes to libprio-rs. AFAICT all branches in the DPF sampling function are based on loop variables or compile-time constants.
Taking a step back, do we actually care if the algorithm that computes the mapping from pseudorandom byte strings to field elements is constant time? Take rejection sampling, for instance: assuming the underlying PRG is constant-time (i.e., AES-CTR is implemented in constant time) It seems to me that all that can be leaked by a non-constant-time implementation is the value of the bits that are rejected.
Maybe you are right and that leakage is fine (in that it can be simulated in a proof). @chris-wood, do you have an insight as to why constant-time is required for hash-to-curve and thus rejection sampling is ruled out?
Maybe you are right and that leakage is fine (in that it can be simulated in a proof). @chris-wood, do you have an insight as to why constant-time is required for hash-to-curve and thus rejection sampling is ruled out?
Yeah -- some applications of hash-to-curve operate on secret input, and thus any side channels may leak the secret. Dragonblood is one famous example of this that motivated the hash-to-curve standard.
Ugh, sorry I copied the wrong link. I'm on a Mac and feel wayyy out of my depth :) Here's the link: https://github.com/google/distributed_point_functions/blob/master/dpf/int_mod_n.h#L170. I'm not sure this is guaranteed to be constant-time, i.e., the runtime of the division operation might depend on the value of r
. I think this depends on the CPU arch.
@cjpatton Ah got it. Yeah, I was assuming arithmetic operations are constant-time, but they might not be. In that case the fix should still be easy as you pointed out.
@chris-wood So it seems that the number of iterations needed for many hashes of the same password + different auxiliary information (MAC, counter, ...) can be used to find the password. On the other hand, revealing even full blocks of an AES-CTR output shouldn't reveal anything about the key or any other blocks in the output. So is the issue in Dragonblood that the password is low-entropy?
Seems like we may have consensus for rejection sampling. Here's a PR: https://github.com/cjpatton/vdaf/pull/31. @schoppmp if you're happy with this outcome then please review at your convenience :)
SGTM, provided we emphasize that the seed MUST be high-entropy, i.e., not derived from a password using a KDF or similar. I'll take a look at #31.
Actually I think high entropy (or, to be more precise, high min-entropy) won't be sufficient for all PRG constructions. See comments on the PR.
and
In hash to curve draft, there is a method to derive an arbitrary number of field elements from a seed source. This is called a Expander, and there are two types: one based on Merkle-Damgard functions and the other based on eXtendable Output functions. See section 5.3 : https://datatracker.ietf.org/doc/html/draft-irtf-cfrg-hash-to-curve#section-5.3