TCNCoalition / TCN

Specification and reference implementation of the TCN Protocol for decentralized, privacy-preserving contact tracing.
MIT License
263 stars 33 forks source link

Tree-like hash ratchet #40

Open burdges opened 4 years ago

burdges commented 4 years ago

In the current design, if users can create a time barrier that permits them to disclose CENs before a specified point in time, but only if they start a new report authorization key. There are two problems with this approach: Users suck at thinking about privacy in advance. Users might start too many report authorization keys.

Instead, you could assume some fixed rotation schedule for report authorization keys consisting of 2^d broadcasts for some fixed d, but produce CENs using a tree-like hashing structure

    cek_0 = rak
    (cek_{2i}, cek_{2i+1} ← H_cek(cek_i).
    cen_i = cek_{2^d+i}

At any time, the application only requires O(d) storage, which appears negligible. Now users reveal whatever cek_j they like along with the depth log_2 j, but not j itself. Reveals need not cover the cen_i.

Initial versions could simply use this hashing strategy, while leaving the reveal control interface to future work. In particular charging logs or GPS logs could exclude time spent at home automatically. There is a small privacy risk in revealing the depth log_2 j, but the system could randomly do slightly deeper reveals as cover.

hdevalence commented 4 years ago

Hey Jeff,

Thanks for the suggestion. Kenny Paterson mentioned something similar in the context of DP-3T in a call yesterday -- do you know whether there any detailed writeups floating around?

Tree hashing sounds like a nice idea, but I worry that once all the details are worked out, it will pose significant additional complexity that doesn't outweigh the benefits.

First, I think in your description, iterating through the CENs in index order iterates through the tree in depth order, so it doesn't actually let you select time intervals by revealing a root. I think to do that, you would need to generate CENs from the leaves of the tree. (Maybe I am misreading your notation, so in what follows I'm going to assume that CENs are generated from the leaves).

I don't think it's sufficient to reveal just a tree root (or rather, that having that capability alone is not sufficient to justify the change), because this only allows you to select a single 2^k-sized, 2^k-aligned subtree. But this probably doesn't actually align with the time interval that a user actually wants to disclose. This can be patched up by specifying multiple tree roots, so that a user can specify a particular, custom range (here I'm using H to denote hidden, R to denote revealed, r to denote revealed by a higher-level root):

         H               (root CEK)
        / \       
       /   \      
      /     \     
     /       \    
     H       H       
    / \     / \   
   /   \   /   \  
   H   R   H   H   
  / \ / \ / \ / \ 
  H R r r R H H H (leaf CEKs)
  | | | | | | | |
  v v v v v v v v
  0 1 2 3 4 5 6 7  (CEN indexes)

This means that reporting clients now have to do relatively more complex tree traversal logic to select the reporting interval, and the report format has to include multiple tree roots, and presumably some kind of limit on how many roots can be included. However, it does mean that reporters can select an arbitrary subset of the CENs they report, which is nice.

In contrast, in the current protocol, I'm not sure that the problems that you mentioned will actually be significant in practice. I don't think users are required to think about privacy in advance. Their user-agent can automatically start a new report at some regular interval (say 1-4 per day), then submit multiple reports when they wish to disclose. I think that the common case would then be the one where j_1 = 0, in which case having more fine-grained reporting tools doesn't help much.

burdges commented 4 years ago

Yes, I wrote it recursively using "heap addressing" so 2^d+i with i>0 runs over the leaves. You'd implement the iteration over CENs with a right-leaning copath Vec<Option<[u8; 16]>> from the root to your current leaf, not heap addressing.

You'd regenerate any desired roots when revealing because you only hash twice as many times as currently, and bluetooth prevents consuming CENs faster than one per minute anyways.

Yes, anytime you reveal a 2^k size subtree then you'd reveal subtrees of many smaller sizes too, exactly like your picture.

I agree people cannot really remember exactly when they visited some embarrassing place while coughing up their lungs. ;) I'd mostly envision the resolution being useful for automated tools that separate or omit time spent in different places. I've no idea if anyone would add should tools but this opens the door.

hdevalence commented 4 years ago

Thanks for the clarification!

I guess the big-picture question is whether a tree-like hash ratchet is useful at all, if the lifetime of each report is dialed down to try to prevent tracking of reporters. If each report only covers, say, 4 hours, is it useful to be able to exclude 10-minute intervals?

burdges commented 4 years ago

In practice, I suspect one could report every 1 minute period separately because any contact tracing fails anyways beyond relative few people. shrug