Closed cjpatton closed 1 year ago
TODO:
I put together a couple prototypes of PRG alternatives in libprio: https://github.com/divviup/libprio-rs/compare/cjpatton/prg-prototypes
One is based on KangarooTwelve (a CFRG spec) and the other is based on SHA-3 (SHAKE128). Initial benchmarks are encouraging. The following table shows Prio3CountVec shard time on my laptop for various input lengths with no multithreading:
PRG | 100 | 1000 | 10,000 |
---|---|---|---|
PrgAes128 (status quo) | 149.72 us | 3.13 ms | 22.6 ms |
PrgK12 (KangarooTwelve) | 156.69 us | 3.15 ms | 22.7 ms |
PrgSha3 (SHA-3) | 171.14 us | 3.21 ms | 23.6 ms |
We still need to benchmark Poplar1. The implementation is WIP: https://github.com/divviup/libprio-rs/pull/381
Per @simon-friedberger's request, I've updated the benchmark branch of libprio with a PRG based on HKDF-SHA256. I played around with a few variants, all are roughly as performant:
PRG | 100 | 1000 | 10,000 |
---|---|---|---|
PrgHkdfSha256 | 465.88 us | n/a | n/a |
(Note that input sizes of 1000 and 10,000 exceed the the maximum output length for HKDF-SHA256.)
The performance is quite bad compared to the other algorithms. I'm not sure if we're using a software implementation of SHA-256, but if we are, these numbers aren't too surprising given how much hashing is involved in HKDF.
@divergentdave and I have been working on Poplar1 in libprio: https://github.com/divviup/libprio-rs/pull/434
We're still working on benchmarks, but so far it looks like SHA-3 is up to 40% more expensive than PrgAes128
. This confirms @schoppmp's hypothesis that Prio is arithmetic-bound and Poplar is PRG-bound.
We have merged the basic functionality for Poplar1 into libprio (🎉). I have also updated my PRG prototypes branch (https://github.com/divviup/libprio-rs/compare/cjpatton/prg-prototypes) with benchmarks with Poplar1 using the current PrgAes128
and the alternative PrgSha3
. The table below summarizes shard and preparation timings for both variants and for various bit lengths.
PRG | 16 bits | 64 bits | 128 bits | 256 |
---|---|---|---|---|
PrgAes128 shard time | 49.4 us | 143.8 us | 270.6 us | 519.5 us |
PrgSha3 shard time | 49.2 us | 143.8 us | 268.9 us | 518.2 us |
PrgAes128 prep time | 438.2 us | 1.4 ms | 2.7 ms | 5.3 ms |
PrgSha3 prep time | 667.0 us | 2.4 ms | 4.5 ms | 8.9 ms |
(*) For prep time, we're measuring the time it takes for an Aggregator to evaluate an IDPF share and compute its sketch share. We are evaluating at the last level the three, which exhibits worst case performance. Prep time is sensitive not only to the the number of candidate prefixes, but the distribution. This is because the IDPF evaluation caches intermediate results. To compute the candidate prefixes, we sample 1000 measurements from a Zipf distribution (as suggested by the original Poplar paper) and compute the prefix tree for threshold = 10.
Observations:
Caveats:
Conclusion: SHA-3 performs well enough in all situations except IDPF evaluation. There the performance hit is significant enough to warrant investigating an alternative that has hardware support.I propose focusing #32 on this question. Further, there is no need to design to the Prg
API; something designed specifically for IDPF would make sense.
In the meantime, I propose we go ahead with replacing PrgAes128
with something based on SHA-3. Here is my proposal: https://github.com/cfrg/draft-irtf-cfrg-vdaf/pull/136
In Prio3,
PrgAes128.derive_seed()
is used with a fixed seed for the Fiat-Shamir heuristic. We need to decide if this is safe. It would be sufficient to prove, say, that this function is indifferentiable from a random oracle when modeling AES as an ideal cipher.Note that we might end up picking a PRG in #32 that is already safe a safe choice here, in which case we should just replace
PrgAes128
.