Closed cjpatton closed 12 months ago
Summarizing some offline discussions: Running verification multiple times to improve soundness should work straightforwardly if there is no joint randomness. Once the report is fixed, the only coin that the soundness probabilities are evaluated over is the query randomness, and checking the polynomials at multiple points gives us independent events, so the protocol's overall soundness error would look something like the product of the soundness error from each query run.
However, once we re-introduce joint randomness, the joint randomness values would be constant across each proof verification re-run, and thus the probability of repeated FLP verification detecting a dishonest submission is correlated. There's now another coin involved, and repeatedly running proof verification doesn't re-flip it. Thus, the soundness error you get from running proof verification repeatedly would be dominated by the terms that come from the joint randomness, and the overall soundness would not get exponentially better with more verifications. The query randomness values are used in polynomial identity tests that show that gadget input and output values are consistent, while the joint randomness values are used by validity circuits to compress intermediate values from multiple gadgets into a single field element. (typically, by computing a random linear combination of polynomials that are each zero for valid inputs) Concretely, a dishonest verifier could choose an invalid measurement, honestly evaluate all the gadgets in the validity circuit, and adversarially choose joint randomness blinds such that the circuit output is zero through cancellation. If they wanted to attack Prio3Sum, they could submit a measurement with, say, 2 in every "bit", and search until they find a joint randomness value that satisfies $x + x^2 + x^3 + ... + x^n = 0$. This would take an average of $\approx(2^{32}/n)/2$ tries with Field32, which is feasible.
We could achieve reasonable soundness with smaller fields if we re-ran the validity circuit (and thus proof generation) multiple times with different joint randomness values, but this would cancel out the client-to-aggregator communication benefits of using a smaller field.
For each circuit that currently uses joint randomness, we could imagine dropping the compression step, and having validity circuits output a vector of field elements. This would increase both client-to-aggregator and aggregator-to-aggregator communication, but the tradeoff for using smaller fields might be worth it with moderately-sized circuits (i.e., only a few gadget evaluations).
Those are the only tricks I can think of at the moment to get around this. Of course, they'd need further analysis.
@divergentdave Can we construct a new joint randomness for each verification using the iteration of the check as input? (Or just iteratively rehashing the joint randomness.)
IIUC that should make no difference to an honest client while for a cheating client the $\approx 2^{16}$ tries would multiply.
Yeah, deriving additional joint randomness values would not be an issue. The issue is what you do with the resulting values, because there's a soundness versus report size tradeoff. If you re-run proof generation multiple times and send a new set of proof shares for each joint randomness, then the shorter field elements and repeated proof shares cancel out, so that the report size is about the same as when you started. If you somehow sum up proof shares from each run to compress them, i.e. by checking a circuit that evaluates a sub-circuit at multiple points, then that leaves a lone 1/|F| term in the soundness error.
@divergentdave we notice that, for some use cases, e.g. SumVec
, the communication cost between clients/aggregators is dominated by the measurement shares, not proof shares. Since, in such case, the size of the measurement shares is $O(n)$, while the size of proof share is roughly $O(\sqrt{n})$, where $n$ is the measurement length. Therefore, for such use cases, multiple proofs in smaller field can still lead to significantly smaller input shares (Prio3InputShare
).
Here is our proposal.
The basic idea is similar to what you said above, i.e., to construct multiple independent proofs, using distinct set of prover randomness and joint randomness and accept the inputs only after all proofs are validated. Here we explain this idea by describing the changes from section 7.2 in draft-irtf-cfrg-vdaf-06.
PROOFS
, which is defaulted to one if not set.measurement_to_input_shares
, clients produce PROOFS
proofs, each with a distinct set of prover randomness and joint randomness.
PROOFS
number of prover randomness by deriving PROVE_RAND_LEN * PROOFS
field elements, from the same seed
prove_rands = Prio3.Prg.expand_into_vec(
Prio3.Flp.Field,
k_prove,
Prio3.domain_separation_tag(USAGE_PROVE_RANDOMNESS),
b'',
Prio3.Flp.PROVE_RAND_LEN * PROOFS,
)
PROOFS
joint randomness by deriving JOINT_RAND_LEN * PROOFS
field elements, from the same seed, if use_joint_rand
is set
joint_rands = Prio3.Prg.expand_into_vec(
Prio3.Flp.Field,
Prio3.joint_rand(k_joint_rand_parts),
Prio3.domain_separation_tag(USAGE_JOINT_RANDOMNESS),
b'',
Prio3.Flp.JOINT_RAND_LEN * PROOFS,
)
prove
PROOFS
times, using distinct set of prover randomness and joint randomness, and then concatenate all proofs into one message.
proofs = []
for i in range(PROOFS):
if use_joint_rand:
joint_rand, joint_rands = front(Prio3.Flp.JOINT_RAND_LEN, joint_rands)
else:
joint_rand = []
prove_rand, prove_rands = front(Prio3.Flp.PROVE_RAND_LEN, prove_rands)
proofs.append(Prio3.Flp.prove(inp, prove_rand, joint_rand))
leader_proof_share = concat(proofs)
SHARES
shares, as before. Each new share has PROOF_LEN * PROOFS
field elements.
helper_proof_share = Prio3.Prg.expand_into_vec(
Prio3.Flp.Field,
k_helper_proof_shares[j],
Prio3.domain_separation_tag(USAGE_PROOF_SHARE),
byte(j+1),
Prio3.Flp.PROOF_LEN * PROOFS,
)
decode_leader_share
, each leader decodes proof share with correct size.
l = Prio3.Flp.Field.ENCODED_SIZE * Prio3.Flp.PROOF_LEN * PROOFS
decode_helper_share
, each helper recreates proof share, similar to step 2.iv.
proof_shares = Prio3.Prg.expand_into_vec(
Prio3.Flp.Field,
k_proof_share,
c_proof_share,
byte(agg_id),
Prio3.Flp.PROOF_LEN * PROOFS)
prep_init
, aggregators compute PROOFS
verifier messages, one for each proof.
PROOFS
joint randomness, as in step 2.iiPROOFS
query randomness
query_rands = Prio3.Prg.expand_into_vec(
Prio3.Flp.Field,
verify_key,
Prio3.domain_separation_tag(USAGE_QUERY_RANDOMNESS),
nonce,
Prio3.Flp.QUERY_RAND_LEN * PROOFS,
)
PROOFS
parts evenly, each with PROOF_LEN
field elements. Then the aggregator invokes query
PROOFS
times, each time using the corresponding joint randomness and one distinct query randomness, and then concatenates all returns into one verifier share
verifier_shares = []
for i in range(PROOFS):
proof_share, proof_shares = front(Prio3.Flp.PROOF_LEN, proof_shares)
if use_joint_rand:
joint_rand, joint_rands = front(Prio3.Flp.JOINT_RAND_LEN, joint_rands)
else:
joint_rand = []
query_rand, query_rands = front(Prio3.Flp.QUERY_RAND_LEN, query_rands)
verifier_shares.append(Prio3.Flp.query(meas_share,
proof_share,
query_rand,
joint_rand,
Prio3.SHARES)
verifier_share = concat(verifiers_shares)
decode_prep_share
,
l = Prio3.Flp.Field.ENCODED_SIZE * Prio3.Flp.VERIFIER_LEN * PROOFS
prep_shares_to_prep
, each leader sums up all verifier shares, as before, and splits it into PROOFS
parts evenly, with VERIFIER_LEN
field elements for each part. Then the leader invokes decide
on each part. If any part is invalid, the leader should bail out.
verifiers = Prio3.Flp.Field.zeros(Prio3.Flp.VERIFIER_LEN * PROOFS)
...
for i in range(PROOFS):
verifier, verifiers = front(Prio3.Flp.VERIFIER_LEN, verifiers)
if not Prio3.Flp.decide(verifier):
raise ERR_VERIFY # proof verifier check failed
...
Adversaries have to find PROOFS
valid proofs, using distinct joint randomness, before the aggregators can accept the input. Therefore we decrease the possibility of violating soundness with single iteration, to, $\varepsilon^t$, if we model each PRG as random oracle, where the underlying FLP is $\varepsilon$-sound and $t$ = PROOFS
.
Here are some preliminary results to compare clients/aggregators communication costs.
SumVec
with num_aggregators=2
, bits=1
, and len
is shown as the “Measurement dimension” (x-axis) in the diagrams.
We benchmark the Rust implementation, with the following setup
SumVec
with num_aggregators=2, bits=1, len=100_000
Prio3.shard, Prio3.prepare_init, Prio3.prepare_preprocess, Prio3.prepare_step
.@cjpatton @divergentdave what do you think? Specially does it make senses to generate the prover/joint/query randomness this way? Any better way to plug this idea into current protocol?
Thanks, I think this is a promising idea! I hadn't considered the effect on measurement size previously, and as you point out, it's the more important component here. It looks like we can get pretty significant report size wins with this, so I think it would be worth the extra complexity and bookkeeping. Concatenating and splitting proof and verifier messages as you show makes sense to me.
Currently we specify which field each Prio3 instance uses. We would need to provide more guidance on soundness error, and either give users some choices for combinations of field size and number of proofs, or give formulas and thresholds for choosing such combinations.
When considering lowering the field size, it's worth noting we have a lower bound constraint from the FFT interpolation we do. With Field32
, we use a subgroup of order $2^{20}$, so we can handle calling a gadget about a million times. This should be fine for our purposes, especially since the ParallelSum
gadget lets us scale gadget calls sublinearly with input sizes.
We'll also need to satisfy ourselves that the security analysis hangs together, and that the repeated proofs don't introduce soundness or zero-knowledge issues. I'll defer to Chris on what's needed there.
I also support this idea. It seems useful to provide users a trade-off between communication cost and CPU time. As far as the security implementations of this change:
Privacy: I don't think privacy is impacted. The only thing to keep in mind is that using a smaller field increases the chance that proof generation fails: https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/poc/flp_generic.py#L242. But I don't think this is a big deal.
Robustness: As you point out, we can achieve the same soundness error for the FLP with a smaller field by repeating the proof enough times. This is only true if the joint randomness is derived correctly, i.e., the changes to Fiat-Shamir you propose matter for security. I agree that the change seems intuitively safe, but we'd need to think through the security reduction to confirm this intuition.
What Fiat-Shamir does is take an interactive protocol and make it non-interactive. Generally we think of this protocol as being sequential: the verifier sends a challenge, the prover responds; the verifier sends the next challenge, the prover sends the next response, and so on. In the interactive version of "multi-proof Prio3" (i.e., your proposal), it seems like all of these challenge-response steps are done concurrently. I'm not sure if/how this feature would impact the usual security reduction. It may be that we need to derive each joint randomness such that it depends on the previously generated proof.
Generally speaking, messing up Fiat-Shamir is a well-known source of bugs in practical ZKP systems. When need to be very careful about how we apply it.
All of that said, I would support extending Prio3 as you suggest in order to unblock anyone who may want to implement/experiment with it. I'd like to suggest the following requirements for the change:
Field64
, say with PROOFS==2
, would be good enough? Of course, a private codepoint cuuld be used for, say, Prio3SumVec with a 32-bit field and 4 proofs.Thank you very much your inputs!
Do you think using Field64, say with PROOFS==2, would be good enough?
This is a little bit tricky. Theorem 4.3 in BBCGGI19 states that the the soundness error $\varepsilon = \frac{M * \text{degG}}{(|F| - M)} $.
If we follow Theorem 1 in DPRS23 and simply replace $\varepsilon$ with $\varepsilon^{\text{PROOFS}}$, then assume, $q{\text{RG}} \gg q{\text{prep}}$, the soundness error ratio of two Field64
proofs over one Field128
proof is roughly $M * \text{degG}$, i.e. "number of gadget calls" times "degree of the gadget".
For SumVector
with len=100_000
, this is about 630.
I am wondering if it makes sense to make PROOFS
a configurable parameter.
One more clarification. Shall I continue with PR for this draft or PR for libprio-rs?
This is a little bit tricky. Theorem 4.3 in BBCGGI19 states that the the soundness error ε=M∗degG(|F|−M).
If we follow Theorem 1 in DPRS23 and simply replace ε with εPROOFS, then assume, qRG≫qprep, the soundness error ratio of two
Field64
proofs over oneField128
proof is roughly M∗degG, i.e. "number of gadget calls" times "degree of the gadget".
I'm a bit lost here. Can you clarify how you expect the bound to change? Let's let $\epsilonb$ be the soundness for the FLP with field size $b$ (in bits, so $\epsilon{128}$ is for Field128
), $q\text{RG}$ the number of random oracle queries, and $q\text{Prep}$ the number of tries the attacker gets to break robustness. (Assuming $q\text{RO} >> q\text{Prep}$ is fine.)
Note that we can't apply [DPRS23, Theorem 1] directly here because we're tweaking the construction.
For
SumVector
withlen=100_000
, this is about 630.I am wondering if it makes sense to make
PROOFS
a configurable parameter.
Do you mean that PROOF == 630
? Does PROOF
depend on the number of gadget calls? If so, why?
I think what we'd hope is that we can get roughly the same level of robustness at the cost of $O(b_1 / b_0)$ additional proofs, where $b_1$ is the size of thee large field (Field128
) and $b_0$ is the size of the small field.
For what it's worth, I don't think 630 proofs is ever going to be feasible, especially since prove/query time for SumVec with dimension 100,000 is already on the order of milliseconds.
One more clarification. Shall I continue with PR for this draft or PR for libprio-rs?
Let's start by extending the reference implementation: https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/poc/vdaf_prio3.py
Once that's done we can move on to the draft itself. As far as libprio-rs goes, you'll have to ask the maintainers if they're willing to implement this change. (@divergentdave is one of them.) I'd hold off on this until we've settled more of the open questions.
What Fiat-Shamir does is take an interactive protocol and make it non-interactive. Generally we think of this protocol as being sequential: the verifier sends a challenge, the prover responds; the verifier sends the next challenge, the prover sends the next response, and so on. In the interactive version of "multi-proof Prio3" (i.e., your proposal), it seems like all of these challenge-response steps are done concurrently. I'm not sure if/how this feature would impact the usual security reduction. It may be that we need to derive each joint randomness such that it depends on the previously generated proof.
Generally speaking, messing up Fiat-Shamir is a well-known source of bugs in practical ZKP systems. When need to be very careful about how we apply it.
I don't think this will be an issue, because we can still write the FLIOP as having three half-rounds of communication. First, the prover sends all measurement shares to the verifier, second, the verifier sends PROOFS
many sets of public coin field elements to the verifier, and third, the prover sends all proof shares to the verifier. Both the computation of each proof and the verification of each proof do not depend on other proofs, so it's okay to treat them all as happening in the same FLIOP round.
On a related note, we will need to assume that the algorithm identifier commits to the choice of field and choice of PROOFS
, in addition to the Prio3 circuit, in order to have those parameters covered by the transcript.
For
SumVector
withlen=100_000
, this is about 630. I am wondering if it makes sense to makePROOFS
a configurable parameter.Do you mean that
PROOF == 630
? DoesPROOF
depend on the number of gadget calls? If so, why?
The 630 number was for $M * \textrm{deg G}$. The degree of the gadget (Mul) is two, and using the square root heuristic on a measurement length of 100,000 gets you chunk lengths and gadget calls both approximately equal to 315.
I think the upshot of this is that we can't gloss over the impact of gadget calls (and gadget degree) when we start stacking up many proofs over smaller fields, especially when measurements are large. For this choice of parameters, and with the existing single proof over Field128, we get a soundness error of roughly $2^{-128 + 9.3} = 2^{-118.7}$. If we cut the field size by a factor of about four, and use PROOFS=4
, then the soundness error for one proof would be around $2^{-32 + 9.3} = 2^{-22.7}$. Four such proofs, each independent, gets us only up to $(2^{-22.7})^4 = 2^{-90.8}$. That ratio of soundness errors of "two Field64 proofs over one Field128 proof" stacks up with each proof we add.
@cjpatton @divergentdave what are acceptable soundness errors in practise? If I understand correctly, the soundness error needs to be small enough so a brute force attack of 1/soundness
time is infeasible. If so then I think anything < 2^-64
should be sufficient?
Thank @divergentdave for clarifying up. .
@cjpatton, sorry to confusion.
Do you mean that PROOF == 630? Does PROOF depend on the number of gadget calls? If so, why?
I mean 630 is roughly $\varepsilon{64}^2/ \varepsilon{128}$ for SumVector
with len=100_000
(using the method I described above, which may not be appropriate).
I am not sure we can assume $O(b_1/b_0)$ number of proofs will give us robustness parity, without further clarification on the soundness error for multiproof construct.
Let's start by extending the reference implementation: https://github.com/cfrg/draft-irtf-cfrg-vdaf/blob/main/poc/vdaf_prio3.py
Sounds good.
(Sorry for the late response - I'm at a conference this week.)
@divergentdave I don't think this will be an issue, because we can still write the FLIOP as having three half-rounds of communication. First, the prover sends all measurement shares to the verifier, second, the verifier sends
PROOFS
many sets of public coin field elements to the verifier, and third, the prover sends all proof shares to the verifier. Both the computation of each proof and the verification of each proof do not depend on other proofs, so it's okay to treat them all as happening in the same FLIOP round.The 630 number was for M∗deg G. The degree of the gadget (Mul) is two, and using the square root heuristic on a measurement length of 100,000 gets you chunk lengths and gadget calls both approximately equal to 315.
I think the upshot of this is that we can't gloss over the impact of gadget calls (and gadget degree) when we start stacking up many proofs over smaller fields, especially when measurements are large. For this choice of parameters, and with the existing single proof over Field128, we get a soundness error of roughly 2−128+9.3=2−118.7. If we cut the field size by a factor of about four, and use
PROOFS=4
, then the soundness error for one proof would be around 2−32+9.3=2−22.7. Four such proofs, each independent, gets us only up to (2−22.7)4=2−90.8. That ratio of soundness errors of "two Field64 proofs over one Field128 proof" stacks up with each proof we add.
Yeah this sounds about right. (And good catch on including PROOFS
in the transcript.) if we view Prio3 as a kind of generic compiler from an FLP, and we can view this thing as a kind of "multi-proof FLP", then [DPRS23, Theorem 1] probably covers this as well, obtaining the bound that @albertpl expects. Though I wonder if the bound we would get is looser than necessary: can anyone think of a matching attack? My intuition is that we should be able to prove a tighter robustness bound, but I could be wrong.
@wangshan what are acceptable soundness errors in practise? If I understand correctly, the soundness error needs to be small enough so a brute force attack of
1/soundness
time is infeasible. If so then I think anything <2^-64
should be sufficient?
There is another thing to consider, which is robustness. The soundness of the base FLP is part of this, but if our circuit needs joint randomness, then random oracle (RO) queries can weaken robustness well below the base soundness bound.
I have been thinking in terms of allowing the attacker up to $2^{64}$ RO queries. So if you want a robustness bound of about $2^{-64}$, you would need a base soundness of about $2^{-128}$ (according to [DPRS23, Theorem 1]).
This is why we picked a 128-bit field for Prio3 variants that use joint randomness. However, we actually get weaker robustness as the input size increases. If we want robustness of at most $2^{-64}$, then we'd need a slightly bigger field, depending on the time. For SumVec, say, if input length is 100,000, then we would a field that has something like $\log_2(100,000)$ more bits.
In my opinion, it's OK for robustness to be weaker than $2^{-64}$ (assuming $2^{64}$ RO queries). That is, it's up to the application to decide whether they can tolerate a weaker bound bound as a result of doing multi-proof with a smaller field.
However, I don't think we should weaken the bounds we currently get, i.e., we should only change the existing Prio3 variants if we find that we can preserve the same level of security without adding too many extra proofs.
I put up this PR and appreciate your review. Specifically I am not sure how we allocate algorithm IDS and specify parameters constrains for these variants.
Thanks @albertpl for the PR, I'm reviewing now. But before we merge this, I would like us to have a better idea of what value of PROOFS
we expect to see in practice so that we have a better sense of the communication/CPU trade-off. Can you say a bit more about your use case?
Prio3SumVec
. What parameters are you using? You mentioned input length of about 100k; is that indeed the target we should be evaluating? Do you want a value for bits
other than 1
?Field64
, or do you intend to use a smaller one? (I would not recommend this.)I have one comment about soundness error of SumVec, or any circuit that uses random linear combination: there should be an additive factor of (len * bits) / |F|
because of random linear combination of the 0/1 bit checks, per Remark 4.8 of BBCGGI19.
Still using len=100,000
, bits=1
as an example, the soundness should be $\frac{2\lceil \sqrt{len} \rceil }{|F|-\lceil \sqrt{len} \rceil} + (lenbits)/|F|$. For Field128, using $|F|=2^{128}$, this gives $2^{-118} + 2^{-111}$. The latter term seems to be dominant, so we should take it into account.
This is why we picked a 128-bit field for Prio3 variants that use joint randomness. However, we actually get weaker robustness as the input size increases. If we want robustness of at most , then we'd need a slightly bigger field, depending on the time. For SumVec, say, if input length is 100,000, then we would a field that has something like more bits.
@cjpatton can you elaborate this part? I don’t quite understand it…
@junyechen1996: Basically, for a fixed robustness bound, as your circuit gets more complex (i.e., more gadget calls), you need to compensate by increasing the size of the field (or the number of proofs).
You mentioned Prio3SumVec. What parameters are you using? You mentioned input length of about 100k; is that indeed the target we should be evaluating? Do you want a value for bits other than 1?
Yeah. len=100_000, bit=1
is our current target. Appreciate that we can evaluate with this set of parameters first.
What field do you want to use? Field64, or do you intend to use a smaller one? (I would not recommend this.)
Field64 is our first choice too.
I can help you assess this if you'd like.
Yeah. That would be great!
What robustness bound are you hoping for? Note that, if a Client succeeds in breaking robustness, then it can ruin an aggregate result. Perhaps you're OK with this happening "once in a while", but rarely?
I suppose some reasonable tradeoff between communication/computation cost and robustness is acceptable. Some guidance on robustness would be very helpful.
I think of robustness as follows. If your robustness bound is $2^{-32}$, then after consuming $2^{32}$ reports it's highly likely that one of those reports is invalid. How "bad" this robustness failure is depends on how use Prio3: Assuming you're using DAP(https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/), then think your tolerance for robustness failure depends on the expected batch size, since this is the blast radius of a a single invalid report. If you're batch size is $1000$, then you should expect at most one in every $2^{32} / 1000$ batches to get ruined.
EDIT: Take this analysis with a grain of salt, as the robustness depends somewhat on how well-resourced the adversary is. First, random oracle queries amount to doing pre-computation with the underlying XOF. Second, it depends partially on how many of the reports the attacker controls. In practice we'd observe a high rejection rate before we get to the point of a batch getting ruined.
Great first PR, @albertpl! I think we're ready to start propagating the changes into the actual draft. While at it, please add yourself to the list of contributors in the acknowledgement section :)
Great. Do you think we should wait for the robustness analysis? i.e. shall we quantify the robustness for the multiproof?
The scheme is flexibile enough to "bump" robustness as needed (by adding more proofs or using a larger field), so I don't think we need to block on that. On the other hand, we don't want there to be a difference between the reference code and the draft.
I don't think we need to block aligning the draft. (On the other hand, we don't want there to be a difference between the reference implementation and the draft for very long.) I think we need to come up with recommendations for choosing parameters, but his can wait for a later PR. For now, let's focus on nuts and bolts.
Any progress on updating the text, @albertpl? Note that we are blocking the next draft on this. If you can't get to it within the couple of weeks, let us know.
@cjpatton sorry. let me work on it this week.
We've implemented the core feature; all that's left here is to provide guidance in the draft for choosing the parameters. I've opened a new issue to focus on this remaining work and am closing this issue.
This would trade communication cost for CPU time.