DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.24k stars 180 forks source link

Proposal for a very low communication overhead and improved privacy: PSI protocol #236

Open gianlucag opened 4 years ago

gianlucag commented 4 years ago

It just occurred to me (like probably to many others) that the problem of privately determining if at least one of the EphIDs held by a user device is present on another list of similar EphIDs (held by the server) is a very well known problem in computer science literature and it goes under the name of PSI (Private Set Intersection) problem. You can search it online, there is vast literature for it.

What's Private Set Intersection (PSI)? Private Set Intersection (PSI) is a cryptographic technique that allows two parties to compute the intersection of their sets without revealing anything except the intersection. The "revealed" intersection can be limited to a single party, not both (in our case, the user device).

There are many protocols out there, a few use Bloom and Cuckoo filters like the recent proposal (design 2). Unfortunately all of them suffer from a very high communication complexity, which scales linearly with the size of the larger set. To the best of my knowledge, the proposed design 2 which uses Cuckoo filter requires over 140MB of daily data to be downloaded

Current state of the art I just want to point out that there is a paper (late 2017) describing an interesting technique which make use of fully homomorphic encryption to construct a fast PSI protocol with very small communication overhead

This protocol allows two parties to exchange cryptographic data in such a way that the intersection of their sets is computed. Nothing is revealed except for the intersection itself. The knowledge of the intersection can be confined to just one of the two parties (e.g. the client device).

As a concrete example, the method achieves a remarkable result of about 12.5MB of data transfer to intersect a 5000 items (on our domain it could be the user device) with a set of 16 million items (on our domain it could be the server). The actual numbers for different scenarios can be found on the paper itself.

s-chtl commented 4 years ago

Hi @gianlucag, thanks for your input. Indeed, PSI could be a solution but as explained in the FAQhttps://github.com/DP-3T/documents/blob/master/FAQ.md in topic P3, such solutions might not scale well in the time frame we have. I'll flag the thread as a suggested extension that could be great to have in the future if that is ok with you.

gianlucag commented 4 years ago

@s-chtl No prob, let me know. It's a completely different approach that might be unsuitable at this stage: there is indeed a significant investment of time and engineering effort on there.