Prio3 VDAF for vectors with a bounded number of 1s

divergentdave commented 1 year ago

Section 4.5 of Samplable Anonymous Aggregation for Private Federated Data Analysis notes that it's possible to efficiently validate that a vector consists of zeros and ones, with at most some constant number of ones. This could be done by generalizing Prio3Histogram. (Open question: does this belong in this document or a separate one?) Here's how I'd implement this:

The parameters would be the length, $n$, the chunk length, and the maximum number of ones, $m$.
Input length would be $n+\left\lfloor{log_2(m)}\right\rfloor+1$. Let $l_m=\left\lfloor{log_2(m)}\right\rfloor+1$, the number of bits needed to represent $m$.
Encoding: The first $n$ field elements are just projected measurement values. Let $c=2^{l_m}-1-m$, a constant offset to let us achieve the sum limit we want. Then, for input values $x_i$, compute $c+\sum_i{x_i}$, encode it as a bit vector, and append it to the encoded measurement.
Truncation: Take the first $n$ elements.
Decoding: $id$
Validity circuit: Evaluate the range check polynomial $x(x-1)$ on all INPUT_LEN elements, and combine the results in a random linear combination (using the parallel sum optimization). Separately, evaluate the following, where $x_i$ are the measurement values from the encoded measurement, and $s_j$ are the bits of the $c+\sum_i{xi}$ from the encoded measurement. $$c + \sum{i=0}^{n-1} {xi} - \sum{j=0}^{l_m-1} {2^j s_j}$$ Add the range check and sum check intermediate values together in another random linear combination, and return the result.

cjpatton commented 1 year ago

If the generalization is straight-forward, then I think we can put it here.

Can you explain why this range check is sufficient? (Suppose $m$ is not a power of 2.)

Can we make the $m=1$ case wire-compatible with the current version of the circuit?

divergentdave commented 1 year ago

During encoding, we require the client to commit to the sum of its measurement and the constant c in some auxiliary bits. Because we apply range checks to the auxiliary bits, then $c+\sum_i{x_i}$ must be in the range $[0, 2^{l_m})$. The sum check confirms that the auxiliary bits are consistent with the number of ones in the measurement. Then, $\sum_i{x_i}$ is in the range $[0-c, 2^{l_m}-c)=[-c,2^{l_m}-2^{l_m}+1+m)=[-c,m+1)=[-c,m]$. Since $x_i$ is always non-negative, we can clip this and say that $\sum_i{x_i} \leq m$.

Note that c is equal to zero when m is one less than a power of two. As a concrete example with an "interesting" value of c, let's set m to 5. Five fits in three bits, so $l_m=3$, and c is 2. If a client shards a measurement that's all zeros, they would append <0, 1, 0> when encoding, because $c+\sum_i{x_i}=2$. If a measurement included the maximum of five ones, then $c+\sum_i{x_i}=2+5=7$, and they would append <1, 1, 1>. If a malicious client wanted to include six ones in their measurement, then $c+\sum_i{xi}=8$, but there's no way for $\sum{j=0}^{l_m} {2^j s_j}$ to equal eight when the range checks are satisfied, so they won't be able to zero out the sum check.

The VDAF described here, with m set to 1, is not semantically equivalent to Prio3Histogram, since it would allow measurements with either zero or one ones, while Prio3Histogram only allows exactly one one. (This additional flexibility is reflected in encoded measurements being one element longer, for $s_0$) We could make Prio3Histogram wire-compatible with a different VDAF, or a mode of this VDAF, that does not add any extra elements during encoding, and only allows vectors with exactly m ones.

cjpatton commented 1 year ago

Cool, this sounds like a new Prio3 variant. I don't know how many variants we want in the core document -- I suppose we should discuss this. But if we have concrete use cases in mind for this, I don't see why we couldn't add it here.

One other way I can imagine using this is to for aggregating a Bloom filter: Each measurement would be a contribution to this filter, where the contribution is a bitvector with between $1$ and $k$ bits set, where $k$ is a parameter of the Bloom filter. (Bits are set by hashing a string and deriving indices into the Bloom filter from the hash value.) We could probably get away with a range check of $[0, k]$, but $[1, k]$ would be more correct. (If the Client is behaving correctly, then at least one bit will be set.) Hence I think generalizing range check to include a non-$0$ lower bound would be useful.

What would you call this thing?

Another observation: There is an implicit generalization of the Sum circuit here that I think would be useful for Prio3Sum, and perhaps even Prio3SumVec. Though for the latter we would want to make sure we don't blow up the communication overhead for the common 0/1-vector use case.

cjpatton commented 1 year ago

What would you call this thing?

One suggestion, from discussions about Client-side DP mechanisms: NoisyHistogram?

divergentdave commented 1 year ago

We could consider renaming Prio3Histogram to Prio3OneHotVector and making this new idea Prio3ManyHotVector

junyechen1996 commented 1 year ago

Thanks for putting this together @divergentdave ! I think this new Prio3 variant is useful for checking bounded number of 1s after the one-hot vector goes through randomized response Client-DP.

One clarification is: this assumes $$c + \sum_{i=0}^{n-1} x_i$$ (roughly at most $m + n$) won't overflow field size. It won't be a problem here, because we have already done bit check on $x_i$ and we are using some reasonably large field size which is much larger than $m + n$.

wangshan commented 1 year ago

We could consider renaming Prio3Histogram to Prio3OneHotVector and making this new idea Prio3ManyHotVector

+1, I would suggest Prio3OneHotVec and Prio3MultiHotVec, or if we want to keep Prio3Histogram, can we call the new one Prio3MultiHotHistogram.

Alternatively we can keep Prio3Histogram an alias to Prio3OneHotVec

cjpatton commented 1 year ago

@junyechen1996 it looks like we're going to want this circuit for the DP draft for PPM, correct? If so, what do you think about at least adding it to the reference code here? If useful enough we could also consider putting it in the spec.

Also, I just wanted to suggest a potential alternative circuit inspired by a discussion with @pag-crypto: If the range is relatively small, then instead of bit-encoding the sum and doing an equality check, an alternative would be to do a kind of look-up table thing. (This isn't exactly Paul's idea, so don't blame him if it turns out to be broken.)

Let $s$ denote the sum of the inputs; we want to check that $s \in [1..m]$. It suffices to check that $(s - 1) \cdots (s - m) = 0$.

This means more multiplications, but the upside is we don't break wire compatibility. Plus, we might get away with not increasing the number of ParallelSum-calls. What do you think @divergentdave?

divergentdave commented 1 year ago

Yeah, that makes sense, I think that's just the "range check polynomial" applied in a new place. One nit: none of the alternatives would be wire compatible, since the proofs wouldn't verify naturally, but this would let us keep the same AFE encoding/decoding, and only change the validity circuit.

There's going to be a complicated performance and proof size tradeoff between the two, since the range check polynomial could have a high degree with certain parameter choices. Both approaches would work better with small ranges of valid Hamming weights, but I'm sure they scale differently as the range of Hamming weights broadens.

cjpatton commented 1 year ago

Oh hmm, I wouldn't want to implement this if we had to use a high-degree gadget like PolyEval. I was thinking we could just Mul for this, which we already have?

Yeah, that makes sense, I think that's just the "range check polynomial" applied in a new place. One nit: none of the alternatives would be wire compatible, since the proofs wouldn't verify naturally, but this would let us keep the same AFE encoding/decoding, and only change the validity circuit.

You're right, the proof would change.

There's going to be a complicated performance and proof size tradeoff between the two, since the range check polynomial could have a high degree with certain parameter choices. Both approaches would work better with small ranges of valid Hamming weights, but I'm sure they scale differently as the range of Hamming weights broadens.

Oh hmmm, I wouldn't want to use a high-degree gadget for the range check. Couldn't we use Mul, via ParallelSum?

divergentdave commented 1 year ago

The two straightforward ways to do this would be to put the entire polynomial in one PolyEval gadget (one gadget call, arity 1, degree equal to the number of allowable s values) or to build it out of affine gates and many many Mul gadgets (gadget call count equal to one less than the number of allowable s values, arity 2, degree 2).

ParallelSum wouldn't be easy to apply, because at minimum we'd need some separate multiplication gadget calls upstream of our ParallelSum gadget. Since subcircuit outputs get added up, that means we have to start by expanding the polynomial. We could end up burning gadget arity on constant inputs in order to meet the "identical subcircuit" restriction as well.

The goal of ParallelSum is to balance out the number of gadget calls and the gadget's arity, so we could instead take inspiration from that, define one PolyEval gadget that uses a smaller degree range check polynomial, feed in 's' minus various constants into that gadget, and then finish up the larger range check polynomial by feeding the output of the small polynomials through Mul.

Before we get too far in the weeds specifying something that complicated, do we have a good idea for how wide the range of allowable Hamming weights would be randomized response applications with typical DP parameters?

wangshan commented 1 year ago

Before we get too far in the weeds specifying something that complicated, do we have a good idea for how wide the range of allowable Hamming weights would be randomized response applications with typical DP parameters?

I think that depends a lot on the dimension, for e.g. with 100,000 dimension, I think the range is between dozens to hundreds.

cjpatton commented 1 year ago

Bit-checks it is then :)

cjpatton commented 1 year ago

I'd say this is ready for work. Who'd like to take it?

junyechen1996 commented 1 year ago

I can take this. I will start with a Python implementation in poc folder. Does that sound good @cjpatton @divergentdave ?

cjpatton commented 1 year ago

SGTM.

cjpatton commented 1 year ago

Next step is to decide whether to assign a codepoint for Prio3MultiHotHistogram and, potentially, add it to the draft. It may end up being a dependency for https://github.com/wangshan/draft-wang-ppm-differential-privacy/, in which case we'll at least need to assign the codepoint

Now might be a good time to figure how how this is supposed to work. I'll try to get some feedback at CFRG on how to manage codepoints for VDAFs (cc/ @bifurcation in case you already have a vision for this.)

@divergentdave do you have thoughts about if/how to incorporate this new type into the draft?

cjpatton commented 1 year ago

My latest thought here is that we should just replace Prio3Histogram with Prio3MultiHotHistogram, since the latter is just a generalization.

divergentdave commented 1 year ago

Replacing it seems reasonable. For Prio3Histogram use cases, the encoded measurement will get one field element longer, which is negligible. It won't be possible to enforce "exactly one hot" anymore, but only "zero or one hot". I think that should be fine, by and large. Choosing the "wrong" bucket affects the aggregate twice as much as choosing no buckets when you "should" choose one, so this isn't giving malicious clients an advantage. If choosing zero buckets doesn't make sense semantically, like in a multiple-choice survey, I think results will still be meaningful even if some zero bucket measurements get included.

How about Prio3MultiHotVec for the name? The path dependence of the "histogram" name is starting to wear on me.

cjpatton commented 1 year ago

Works for me. As for renaming: so far we have tried to align the name of the circuit with the data type of the aggregate result rather than than the type of the measurement. I'm OK breaking this convention if it's annoying :)

cjpatton commented 5 months ago

Erring on the side of not breaking compatibility, I'm going to add this as a new Prio3 variant and assign it a codepoint. I'll take @divergentdave's suggested name of Prio3MultiHotVec.

cfrg / draft-irtf-cfrg-vdaf

Prio3 VDAF for vectors with a bounded number of 1s #287