DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

Impractical suggestion for mitigating the main attacks #87

Open colmmacc opened 4 years ago

colmmacc commented 4 years ago

In the 3-pager there are two attacks outlined:

  • A tech-savvy adversary could reidentify identifiers of infected people ​that they have been physically close to in the past by i) actively modifying the app to record more specific identifier data ​and ii) collecting extra information about identities through additional means, such as a surveillance camera to record and identify the individuals. This would generally be illegal, would be spatially limited, and high effort.
  • A ​tech-savvy adversary deploying an antenna to eavesdrop on Bluetooth connections can learn which connections correspond to infected people, and then can estimate the percentage of infected people in a small radius of 50m.

There's also another attack: any user can replay someone else's ephID. Like if I'm evil I might replay my enemies' ephID in a hospital, around people I know to be (or soon will be) infected. Now they'll falsely get quarantined.

I have a suggestion, which is definitely impractical, for mitigating these attacks. Like I said, this suggestion is impractical, but I add it in case it may not be so dumb or in case someone else can see a way to improve it.

In the protocol outlined, the clients derive a series of ephIDs that they rotate through. What if instead, the client deterministically derives a series of Diffie-Helman parameters, and rotates through those. I've seen FAQ question 4, but I don't see why N^2 exchanges would be needed. I think it can be done in 4N. But that part's at the end.

How DH would fit in ...

The public components of each DH param could be advertised via Bluetooth. The app would then perform a DH agreement between its own advertised DH param and the DH params it observes via Bluetooth. The app would then record its set of shared DH values (what we normally call a shared secret); these values are each effectively a secret "pairing ID", known (at that time) only to the apps on the device of each user in the pair.

Nothing is authenticated here; so the DHs can all be replayed; but it won't work because the at least one "real" side of the DH won't try to compute the shared value.

Now, when an infected user is detected it is these pairing-ids that are released and searched for. This has a few benefits; 1. Replaying DHs is harmless, unlike ephIDs 1. No passive observer should be able to infer these pairing-ids. 2. Because the IDs are per-pairing, it is no longer possible for two or more people to determine that they were potentially infected by the same (other) person.

A seed could not be extracted and distributed in the usual way; though the app could use a seed to regenerate all of the DH params and perform trial key exchanges with its own DH params from the same time periods and to see if any resulting pairing-ids match, this would just re-open both of the original passive attacks. The attacker would be able to regenerate the messages that appeared on the wire ("in the air?").

To get the attack resistance, we need to deal with the raw pairing-ids for comparison. To avoid disclosing how many people a person has been in contact with, each person's contacts could be padded (by the app exporting them) with some dummy random IDs. FAQ Q1 mentions that there size limitations and reasons for using the seed, but are these really material? Let's say we pad to 5,000 contacts, that's 150k of data with 256-bit pairing IDs. That seems ok for extraction (though not distribution).

... and that's where things get impractical. Distributing 5,000 pairing-ids per infected person would be huge, and like the FAQ points out, a bloom filter or cuckoo filter isn't going to help much.

Also active attacks remain: the attacker could plant a device at a place of interest, and form their own DH pairings and record those, but the costs are higher. It does open up a security measure the app could do: it could let the user know how many other people it registers in the vicinity. If one is registered where no-one is present, that would be a suspicious indicator of an active attack. But then the nefarious authorities could just plant their device on real people too.

Of course all of this rests on how practical it may be to advertise a DH share over Bluetooth IDs. It seems (from the FAQ) like bluetooth IDs give us 11 bytes to work with, which isn't very big. We'd need to encode an 256-bit ECDH compressed public key as a series of at least 4 or so IDs (allowing some bits for the key, a small epoch identifier for rotation, ordering indicators, and/or possibly a trivial checksum). FEC would add more messages, but may be useful in a lossy environment, hard to tell without experimentation in a noisy environment.

I guess the central point here is that with a DH protocol; it should not be necessary to exchange O(N^2) messages, a full-mesh of pairings can be established in O(N) because everyone sees each other's messages ... and that can be used to break some of the attacks. Actually exchanging information between peers would take N^2 messages, though I note it would only be (4N + N^2) because 11 bytes would enough to send a symmetrically encrypted (and authenticated) identifier to each recipient; we can still do full-mesh key agreement pairings in just 4N.

Anyway, I submit this here just in case someone sees a way to use this or improve on it to more practically mitigate the attacks as they stand. I loved reading the papers and everything about the project!

colmmacc commented 4 years ago

Two comments of my own:

  1. Now that I read "design 2" in https://github.com/DP-3T/documents/blob/master/DP3T%20White%20Paper.pdf , the distribution may not be impractical. Maybe the practicality only rests on whether 4 messages is considerably more lossy than 1 and if the benefits are worth it.

  2. I'm wrong about that hospital attack; it needs to be active - a MITM would need to relay ephIDs in both directions. Still; that might be worth noting as an attack. Two evil adversaries could collude to make it seem like you were near an infected person. One in the hospital, collecting ephIDs from likely-to-be-infected persons. One near you. Relaying your ephID to the hospital, and theirs back to you. My DH outline doesn't help with this attack either.

a8x9 commented 4 years ago

I've proposed something very similar in #66.

I don't know where the 11 bytes from the FAQ come from, as far as I know iBeacon contains a 128 bits UUID. The 4 bytes of the minor and major fields could maybe also be used, but in my proposal I've decided to not touch them so they could be used to differentiate this app beacons from all other unrelated beacons you might encounter.

In my proposal, I've limited the key size to 128 bits instead of splitting a 256 bits key in multiple beacons. The reason is that on iOS you cannot access a received beacon MAC address, therefore if you are in contact with 2 persons and receive 4 beacons, you cannot determine which ones belong together.

The 256 bits could maybe be split in 3 beacons (11 bytes of key per beacon) and use 5 bytes in each beacon to tie them together and also give info about their order, e.g., select random 5 bytes R and include R, R+1 and R+2 as the first 5 bytes of the 3 beacons UUID. This would however add some complexity to the design.

jorants commented 4 years ago

See also https://github.com/DP-3T/documents/issues/74 , you could set up a small network of people that can effectively DOS the system

pdehaye commented 4 years ago

Attack 2 above is very much worth nothing in the context of geopolitical disinformation (see #45 )

kennethchiu commented 4 years ago

Would it be possible to also mitigate by verifying, in some way, that there was bi-directional proximity? A determined attacker could still do a bi-directional relay, but it may be easier to mitigate. For example, setup receivers outside of testing centers to monitor for relayed broadcasts that are out-of-line with expectations.