Closed csharrison closed 1 year ago
Interesting idea! This does indeed sound like a practical way to get good privacy. Even without DP, you get unlinkability of measurements to the client that sent them. This is basically what a mixnet does.
O(n)
space to compute the aggregate result, where n
is the number of measurements. On the other hand, evaluating the prio3 or hits VDAF requires just O(1)
space.My own conclusion is that this protocol shape isn't appropriate for the VDAF document, however this may well be in-scope for what the PPM working group does.
What do you think @bifurcation?
By the way, I'm reading https://eprint.iacr.org/2021/1490.pdf right now. This protocol seems to have some shuffling, but also seems to aim for a security property stronger than unlinkability. I think if we can achieve the same security for this protocol as for prio3/hits, I think it's worth considering including it here.
Thanks @cjpatton , this makes sense to me. A few quick responses below:
From a security perspective, unlinkability of measurements to senders is useful, but is weaker than what we hope to achieve for, say, prio3 or hits, which is that they learn only the aggregate result. Perhaps there's a way to patch the scheme to prevent leaking the set of measurements, as you suggest. But suppose we manage to ensure that all the aggregators learn is the aggregate result: then what does shuffling buy us?
One thing shuffling buys you is flexibility. Let's say all my users input data in [0, 10]. You could imagine a couple of different ways of aggregating this data, e.g. reporting the sum, median, percentiles, etc. With a shuffled version of the data, you can compute all of these on the collector side without embedding the computations in the aggregators.
There is also flexibility in terms of data used. In principle (as long as you trust the clients), the aggregators can just be shuffling opaque bytes. Clients could update to submit different kinds of data without any behavior update needed on the aggregator side. For instance, rather than shuffling integers they could shuffle sketches or some other data structure.
By the way, I'm reading https://eprint.iacr.org/2021/1490.pdf right now. This protocol seems to have some shuffling, but also seems to aim for a security property stronger than unlinkability. I think if we can achieve the same security for this protocol as for prio3/hits, I think it's worth considering including it here.
This proposal is interesting but it isn't the same kind of general purpose shuffling I'm discussing here. That paper only discusses computing histograms.
I agree the flexibility is attractive. My point about shuffling alone doesn't prevent aggregators from learning the set of measurements. Indeed, this is what makes it so flexible!
I agree the flexibility is attractive. My point about shuffling alone doesn't prevent aggregators from learning the set of measurements. Indeed, this is what makes it so flexible!
Oh sorry I missed this point. It should be easy to ensure that the shuffled data is not visible to the aggregators if they don't collude with the collector - just ensure all the data is also encrypted to the collector's key. The requirement is just that the MPC shuffles opaque blobs.
Right, but the collector would see the set of measurements in the clear, right? All the aggregators have done is permuted them.
The collector sees the set of measurements "in the clear", but the technique is designed to be combined with local noise so it isn't really accurate to say that the final release contains the all of the original measurements.
Yes, true enough!
We are beginning to see the emergence of MPC techniques that involve some sort of in-MPC "shuffling" or "ordering" of the reports being aggregated. Example:
I think where we're landing is that this architecture is sufficiently different from VDAFs that we should not try to shoe-horn them into this particular document.
Shuffle differential privacy (https://arxiv.org/pdf/1811.12469.pdf and many other papers) is a type of differential privacy where the privacy mechanism can be thought of as a composition of two pieces:
The power of the shuffle DP is that the server-side mechanism is simple and general purpose. The exact same shuffler can be used for different applications measuring different kinds of things. Privacy guarantees are based on two things:
At an intuitive level, the more reports you have, and the more noise is injected into the reports, the more a given client's input is "hidden in the crowd".
Is this a good fit for VDAFs? Shuffling is conceptually a very simple task to ask a 2-party MPC protocol to do. I could imagine slotting this into the VDAF structure like:
Possible concerns:
Anyways, thoughts appreciated! Shuffling can be a simple and effective way to achieve good privacy with very simple mechanisms, but it doesn't fit so well into VDAFs.
cc @schoppmp