DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.24k stars 180 forks source link

New design proposal - SharedEphID Design #112

Open jasisz opened 4 years ago

jasisz commented 4 years ago

I've came up with a different design, which (in my opinion) solves at least some of the privacy issues with both existing designs. It is somehow similar to the already proposed key exchanging or handshake approaches.

The main difference is in the way devices present and discover themselves. Current design relies heavily on the fact that each person has their own unique EphID in a given epoch. What if instead there was also a SharedEphID, which is shared between all people nearby, and on top of that - epoch and location bound in a non-reversible way?

This can be achieved in quite simple way: all parts of the encounter (possibly more than 2!) needs to agree on one SharedEphID. Possible to do in a couple of ways, but one of them (and simple to explain) would be - everyone still broadcast their own EphID, but picks the one which has the lowest hash if calculated from formula SHA-256(EphID + epoch time + location), where EphID is iterated over all EphID one sees, epoch time is some uniform representation of current epoch, and location is some form of simplified location data (e.g. geohash large enough). It saves the SharedEphID locally.

The very important thing is that if I see only one EphID being broadcasted in a given epoch (I mean my own) I do not calculate and store SharedEphID at all.

In case of infection what is reported and distributed is the list of SharedEphIDs saved by the infected person.

Cons:

Pros:

Any comments are welcome.

jasisz commented 4 years ago

Couple of Q&A from the other discussions I had outside of this ticket:

What is that for? Is something wrong with regular EphIDs? I believe using plain EphIDs is what makes number of attacks possible (reply attacks, spoofing, sniffing for EphIDs in the wild, infected user tracking). Let me ask other question - why individual EphIDs number should be ever used to mean an encounter? I am not saying this is the ideal solution in the form I've presented, but I try to explore some other possibilites. Isn't there also something wrong about treating an infected person in a less privacy-friendly way? Sharing their SKt or list of EphIDs (even in form of a cuckoo filter) derived from it seems to be exactly that.

Bluetooth is not reliable, you cannot just assume everyone sees the same keys I cannot assume everyone involved in an encounter has seen the EphID of infected person in the original solution either. Even if some key was not seen it does not yet mean that this was the one EphID used to calculate chosen SharedEphID. If EphIDs are changing frequently I might be able to calculate the correct SharedEphID in the next epoch.

Why this strange combination of EphID + location + epoch? This is to ensure that SharedEphID is something that could be only calculated if you have actually been there at that time. There should be no way to guess SharedEphID for someone not involved. Still - I am not saying location and epoch are exactly things that should be used! This is just a proposition in which I wanted to highlight that SharedEphID should consist of something known globally (epoch and location) + something out of thin air (sic!) known only to interested parts, EphID in my proposal.

Why this is not just location + epoch then? It wouldn't require Bluetooth and all the work Attacker would be able to calculate all the hashes one is interested in, if they wouldn't include an EphID part. It would allow to scan locations and track infected people.

Isn't a single person able to ruin it for all people involved? Only if that person is able to always advertise with EphID which would be picked as a part of SharedEphID. It does not seem to be doable on a scale, it is similar amount of work to looking for collisions. It can be also mitigated by e.g. calculating SharedEphID as hash(EphID1, EphID2, epoch, location) instead (for all possible EphID1 and EphID2 combinations) and also looking for lowest hash.

Why one SharedEphID and not multiple ones for each seen EphID? Why one and not more of them? This is also possible and would improve reliability, but we probably do care about the amount of data distributed.