TracingWithPrivacy / paper

A privacy preserving contact tracing design to battle infectious diseases
MIT License
28 stars 2 forks source link

Broadcasting SURBs #10

Open burdges opened 4 years ago

burdges commented 4 years ago

I'll send a pull request with this, but here goes a SURB based approach. It dramatically reduces infrastructure costs and avoids ever handling location information.

As I understand from @JonathanLogan we've a 16-26 byte payload from the bluetooth greeting messages. We shall encode a roughly 100 byte message with these small messages using erasure coding, perhaps a rateless one.

We must change this 16-26 byte payload fairly rapidly, but that's fine. There is a risk devices might change their MAC address whenever we change payload, which imposes more overhead upon our erasure code.

We encode a 114 byte single-use reply block (SURB) that consists of

We select the next hop and delay deterministicly using the key exchange, so they require zero overhead in the SURB. Almost all mixnet implementation make the mistake of not doing this, and some even make bigger mistakes like using protobuf, but any implementation can easily be adapted to work this way.

Actual mixnet packet headers would be routing information larger than this 114 byte SURB by at least a factor of two, but maybe much more if you want the network to serve other goals.

All clients have mail boxes that collect message for which they merely create and broadcast SURBs addressed to their own mailbox.

We worry however that advertisors or griefers could collect these SURBs to send SPAM, fake notifications, etc. We could add authentication inside the message, but then griefers still burn the SURB without benefiting themselves.

Instead, we'll make the mix packets use a three way key exchange between an authority, the mixes, and the receiver who makes the SURB.

An authority publishes a fresh public key A = a G for each epoch e. Any mix with public key M = m G publishes M’ = m X in the consensus.

In fact, they would publish (M,M’,pi) where pi = DLEQ(M/G = M’/X) is computed and verified using https://github.com/w3f/schnorrkel/blob/master/src/vrf.rs If we’ve thousands of mixes then clients could download only the M's since M’ is 32 bytes and pi is 96 bytes making (M,M’,pi) take 160 bytes. We do not need this but it’s cheap and easy and eliminates cruft.

At this point, clients build packets using the M’s, but the mix nodes only know M. If you get infected, then the authority validates your SURBs: You give the authority the elliptic curve point E for the SURBs you possess, along with the epoch when you received it, which the authority returns transformed into E’ = x E and gives you DLEQ(E’/X = E/G) batched over the full set of E. Again we can skip the DLEQ proof but it’s cheap and eliminates cruft.

Afaik, there are no risks in users just giving their SRUBs to the authority, but this gives users control over the messages. If needed, we can complicate the protocol so that users and the authority must agree upon the message, almost surely without increasing the SURB size.

We've one last problem of anonymous message retrieval. We could maybe avoid any complexity under this specific setting, but we could easily use the two layer SURB approach detailed in https://github.com/burdges/lake/blob/master/Xolotl/papers/Xolotl-3-mailboxes.tex

ghost commented 4 years ago

Am I assuming correctly you meant that for each epoch e the authority publishes x G = X?

It's not in my wheelhouse, so I would enjoy more details. The SURB with Ristretto point E is broadcasted and can be used by anybody, so the authority transforms it into a new one, how does this help?

burdges commented 4 years ago

Yes, the authority published a public key for each epoch of the mix net consensus.

We broadcast SURBs for anybody to use over bluetooth, but the authority must transform the public key in the SURB before it becomes usable. We could skip this step, but then anybody can use up the SURB, which can only be used once.

If the mix nodes recognize multiple authorities X, Y, Z then recipients can choose the authority they like, but you still only have quite a short authority list.

We'd prefer more flexibility in the signature scheme, like some a certificate chain, but this prevents us form checking the signature at each hop, which the scheme above provides.

We could check another signature at the last hop and/or user's device, which prevents the message from ever resulting in grief, but uses up the SURB.

We could make the first mix node layer demand a signature by some authority. If we do this, then only a malicious first hop mix can grief people.

I suspect either of these last two ideas suffice in practice, and the more flexible authority makes them worth doing, but I wanted to show that verifying an authority at every hop works.

ghost commented 4 years ago

For the moment I'm stuck on as to why these SURBs are different from each other, given x, E and E' seem to be the same for everybody who requests the validation.

burdges commented 4 years ago

Yes. If Alice, Bob, and Carol all spend long enough together then they all get exactly two SURBs, one from each of the other two people. If Alice uses Bob's SURB via the authority, then Carol cannot use Bob's SURB anymore, which is probably a good thing. Why should Bob be notified twice when he only need to get tested once?

It's possibile the authority would recognize Carol asking for the same SURB that Alice did. If you wish to hide that, then SURB validation could be blinded.

ghost commented 4 years ago

I see, I missed your edit that introduced the three-way key exchange. My confusion has decreased.

david415 commented 4 years ago

Your scheme doesn't actually require the use of SURBs. It's clear after reading your description that you could easily have used a temporary remote mailbox identity instead. However that having been said it's true that SURBs have an interesting one-time-use property.

Unfortunately your design essentially makes the mixnet more vulnerable to compulsion attacks. The SURB lifetime is controlled by the epoch duration and thus increasing this actually increases the window of time that an attacker can compel mix operators to decrypt any Sphinx packet they are interested in tracing.

burdges commented 4 years ago

SURBs acts like single-use mailboxes that avoid semi-centralized storage costs.

Actual mailboxes act as unique identifiers, so they must be changed frequently. If we change mailboxes every minute then you've 20k mailboxes over a two week period. We cannot check all those mailboxes daily because the mixnet lacks the throughput, so users must be contacted by their mailboxes, meaning SURBs.

At this point, your mailbox costs several megabytes of semi-centralized storage per person, so several petabytes per billion users. We can afford that and we can reduce it by using mailboxes more wisely, rotating less, etc., but SURB only solutions costs way less.

We've discussed your concerns about compulsion before: You cannot rotate fast enough to prevent compulsion attacks because power surveillance adversaries already hack nodes automatically and can streamline their compulsion machinery with agreements and treaties. Instead, you should complicate compulsion by holding the key remotely, perhaps using key splitting, add possible publicity costs to Intel with SGX, and add hardware compromise costs with custom HSMs. In this case, it's obvious rotating every 10 mins or hour will expose far more personal data than rotating every minute, so naively mailboxes cost privacy by increasing rotation time.

In that vein, @JonathanLogan explained that bluetooth messages cannot realistically rotate faster than 1 per minute, so the minutes required to send the SURB becomes problematic. I've another proposal that avoids both the long storage and transmission times.

david415 commented 4 years ago

I prefer to keep mixnet designs quite a bit more simple than you do. This comes from experience implementing them which is a great deal of work and I think if we're both being honest it's something you seem to have underestimated in the past when you tried your hand at implementation some years ago.

Certainly it is impossible to guarantee that we protect against the compulsion attack by shortening the exposure window... and certainly I agree with you that usage of HSMs are good practice when possible. However these are two orthogonal concerns. Also... I disagree that we shouldn't try to shorten the compulsion window just because we can never guarantee we prevent compulsion attacks.

All that having been said, I think you've identified the performance/scaling tradeoffs between our two designs. In particular, your observation that mailboxes could decrease privacy by increasing rotation time is a problem I don't have a solution for.

Thank you for explaining your thoughts.

burdges commented 4 years ago

I'll close this in favor of #11 since @JonathanLogan says you cannot broadcast enough data quickly enough this way.

burdges commented 4 years ago

I'll reopen this since we could reduce my 114 byte SURB estimate above. We simply drop all the MACs so the SURB becomes an elliptic curve point plus some mailbox identifier, including delivery node, plus the erasure coding overhead. There is no harm in mailbox collisions, but we should encrypt this mailbox identifier using a small block cipher, not our usual stream cipher, so that corrupted messages cannot so easily be used to reveal the mailbox identifier.

We reduce the elliptic curve point from 32 bytes to 28 bytes by using M-221 or NIST P-224, although maybe some smaller works too. We locate mailboxes with a 32 bit identifier encrypted with AEZ-tiny. We shard mailboxes storage with at least the 2+ week key rotation period notice time, so that SURBs get created knowing the shard node's key.

All told, our SURBs now require 32 bytes, so with erasure coding they should fit into a couple Bluetooth LE broadcasts. We could reduce this further with bespoke elliptic curves and more mailbox collisions.

david415 commented 4 years ago

how is this design advantageous over our current mixnet design with the nymserver and message server multi-homed on two different mixnets with differing epoch period lengths?