ROBERT-proximity-tracing / documents

Protocol specification, white paper, high level documents, etc.
Other
247 stars 21 forks source link

"authority... will need to deploy sniffing devices" #6

Open pdehaye opened 4 years ago

pdehaye commented 4 years ago

In https://github.com/ROBERT-proximity-tracing/documents/blob/master/ROBERT-summary-EN.pdf it says:

"If the authority wants to do physical tracking, it will need to deploy sniffing devices"

This is false. Indeed, many such sniffing devices are already deployed (for adtech among other things), can be repurposed, or even remotely reprogrammed.

To be fair to ROBERT, this is not a fault of the protocol itself, and also a flaw in DP-3T. (I point out in their issues tracker in more details what kind of adversaries exist, which could turn out to be useful allies to snooping authorities).

Instead I would say both ROBERT and DP-3T operate in a tough landscape of surveillance, that has been allowed to persist due primarily to the centralization of data protection enforcement into the hands of a few incompetent or unwilling actors over several decades.

bortzmeyer commented 4 years ago

But is there even a need to "deploy sniffing devices"? The authority server will have at least the IP address of the user's device and may be more information from metadata. No sniffing device is necessary when you can simply read the log file of the server. (The protocol apparently does not specify the protocol used for the communication, only that it uses TLS. If it is HTTP, there are a lot of meta-data.)

ReichertL commented 4 years ago

What if they use Tor for all the communication? Then the IPs are not meaningful. This would put a lot of stress on the TOR network.

bortzmeyer commented 4 years ago

What if they use Tor for all the communication?

There is a reference to a mixnet in the paper (apparently only to upload the proximity list?) but immediately swept away by a erroneous reference to NAT.

dg1sek commented 4 years ago

If they would use TOR then I would think it would probably make sense to have a dedicated TOR network only for this application. Else you risk huge traffic overload on the mobile devices which would drain batteries and make the whole app unacceptable. I don't think TOR is the best approach here. I feel that it would make more sense to decide to have a certain level of trust into your mobile operator (you do require that trust anyhow... the mobile operator knows your location regardless of what you do via cell ID and triangularisation).

LilithWittmann commented 4 years ago

I feel that it would make more sense to decide to have a certain level of trust into your mobile operator (you do require that trust anyhow... the mobile operator knows your location regardless of what you do via cell ID and triangularisation).

But how should the network providers protect us from exposing IP-Adresse to authorities? Should they set up a tor-style network? Wouldn't that mean that this network would be owned by 3 operators? As far as I understand the onion protocol, if a particular institution can control a certain % of nodes, they are able to deanonymize the traffic?

pdehaye commented 4 years ago

I agree with all here that IP addresses are also a (mitigable) source of surveillance, but want to hone in on the abilities and relevance of commercial third parties.

The reason is that the risk is much more diffuse, it makes it a system with many more (repurposable) attack surfaces, and thereby changes the legal assessment obligations, with much stronger democratic oversight.

superboum commented 4 years ago

I am working on anonymity systems for my PhD thesis and I would like to emphasize that this issue is very critical considering the context.

How easily my identity can be inferred if I contact a centralized server?

In France, Internet Service Providers and Service Providers are required to collect "connection data" ("données de connexion") by the law some context in french. Some example of data that can be collected by both parties:

Now, if we consider that the government, for its needs has a privileged access to ISPs logs AND, at the same time, host the application, then it can simply cross the data and de-anonymize who accessed its service.

What can be inferred from the fact that my phone contacted the centralized server?

Our service supports two requests:

From the first request, we can learn the infected person's identity, let's say Alice, and the identifiers of persons that met Alice. Now, when Alice's relatives' will query the service with their identifier, the service will be able to learn Alice's relatives identity. To put it in a nutshell, we infer Alice's social graph, including people identities.

Anonymity is hard, some other considerations

Even if we don't cross the data, many other attacks are still possible:

Be precise on your attack model

If we trust ISPs, big companies and government, the proposition maybe sensible. Otherwise, I think the protocol should be deeply reviewed. Finally, absolute care must be used when communicating, and the term anonymity must be avoided when so many actors are able to learn so critical data.

Some idea to build a better protocol

The protocol must enforce that the application behaviour in term of network traffic and workload is indistinguishable between user states: sick, in_contact, healthy to limit consequences. Considering the networking part, literature on mixnets and onion routing should be considered.

A less seducing alternative

Intel SGX and other enclave technologies are trending in the research world. Remote validation and cover traffic could prevent state and ISP monitoring with few modifications to the current protocol (but would require to put some trust on Intel, which I prefer in the end).

ftaiani commented 4 years ago

Regarding the learning of a user's social graph, and the exposure of a patient's sick status:

The access to and use of the above data is however normally heavily regulated, and in some cases stored and managed in independent and disjoint information systems. This fragmentation and regulatory safeguards mitigates (but does not eliminate) privacy risks. My overall point is that a robust solution to detecting potential contamination cases using cell phones might need more than just technical mechanisms, and need to encompass legal and organizational aspects.

ramsestom commented 4 years ago

Now, if we consider that the government, for its needs has a privileged access to ISPs logs AND, at the same time, host the application, then it can simply cross the data and de-anonymize who accessed its service.

Correct me if I am wrong but in most european countries, like France, the government do not have direct access to the IP->identity mapping owned by the ISPs. Access to this kind of information requires the consent of a judge and it is an authorization which is generally given only for one or a few IP addresses at a time. So no, the government won't be able to easily cross the data (and in european democratic countries, I have enaugh trust into governments not to break the GDPR rules and use the IP address or metadata from the logs if they say they won't anyway)

Our service supports two requests:

  • inform_infected(id_seen: array[id])
  • infected?(my_id: id)

From the first request, we can learn the infected person's identity, let's say Alice, and the identifiers of persons that met Alice.

No the server don't know the infected person identity (Alice) at all. Only the contacts are sent to the server and in a mixed way (contacts from infected 1 would be mixed with contacts from infected 2 and with contacts from infected 3 and so on... before beeing sent to the server)

Now, when Alice's relatives' will query the service with their identifier, the service will be able to learn Alice's relatives identity.

No he can't

superboum commented 4 years ago

some error here the customer_id is known by the Service, not the ISP

No, I mean the internal customer_id used by the ISP, if you prefer, I can expand the tuple like that:

Access to this kind of information requires the consent of a judge and it is an authorization which is generally given only for one or a few IP addresses at a time.

Not necessarily according to DÉCRET n°2014-1576 du 24 décembre 2014 or just think to Hadopi, which does not require any judge to send physical mails to people simply based on an IP address list sent by a private company.

No the server don't know the infected person identity (Alice) at all.

As long as the server learns Alice's IP address + access time AND the server knows that the request was made to inform of an infection AND server owner has a way to get information about a a (ip_address, time) tuple (like administrative requisitions) it is enough to deanonymize a user. We don't care about the content of the message, as long we know that it is an "inform" message.

And, to be fair, we should agree on this point, indeed you mention this case in your long document but totally discard it from the summary :

The (HELLO, Time) pairs of the LocalProximityList are sent to the server one by one using a Mixnet

In a footnote, you nuance the need of a mixnet by saying that NAT is enough, probably thinking to carrier grade NAT on mobile phone networks. First, phones do not necessarily access the Internet via mobile networks but also via Wifi where an IPv4 address is very characterizing (and not mentioning IPv6 addresses !). Second, even with a global carrier grade NAT, "connection data" tuple becomes simply (customer_ip, customer_src_port, time) and identification is again.... perfect!

Now, let's consider a Mixnet is used. If it is done right, it could work and protect anonymity. But we need to describe how it will work, analyze how many parties we need to trust it, and so on! The Mixnet is so quickly mentioned in the whitepaper, and totally discarded from the summary, it let me believe that in the end it is not seriously considered. So, for now I consider that the solution will not integrate any Mixnet and it's still possible for ISPs and Government to de-anonymize users by simply collecting their IP address/traffic.

To conclude

I have enaugh trust into governments not to break the GDPR rules and use the IP address or metadata from the logs if they say they won't anyway

It should be clearly advertised that, before using such an application, a blind trust in the government is required, which is not my case. 1 2.

Before being accused of ideology, I would say that proposing a centralized solution first could also be a bias. Let's take Tor example. While on battery, Tor would not run to save battery life. Phones would only periodically generate identifiers that are public/private keys and exchange the public key. We must know that registering an Onion Service (prev. named Hidden Services) over Tor is similar to registering a public key in a DHT. Once on battery, a user's phone would try to access the Onion Services linked to the public keys it has gathered during the day. If I am sick, I simply ask my application to register an Onion Service for every keys that are in the considered time interval....

ramsestom commented 4 years ago

some error here the customer_id is known by the Service, not the ISP

No, I mean the internal customer_id used by the ISP, if you prefer, I can expand the tuple like that:

  • ISP: (firstname,lastname,birthdate,customer_ip,timestamp)

Yes I understood it a few minutes after posting my comment and removed this point but it looks like you where already writting your answer by that time ;)

Access to this kind of information requires the consent of a judge and it is an authorization which is generally given only for one or a few IP addresses at a time.

Not necessarily according to DÉCRET n°2014-1576 du 24 décembre 2014 or just think to Hadopi, which does not require any judge to send physical mails to people simply based on an IP address list sent by a private company.

OK. just took a quick look at this décret and indeed a judge is not needed anymore. Any demand of information related to an IP need to be justified though and is archived so if the french government was to perform such a demand for the IP addresses logged on the StopCovid server whereas he publically engaged no to, he could be sued. And we are talking about millions of IP adresses so the ISPs would probably alert the opinion if they where to receive such demand (because the DÉCRET n°2014-1576 still fix some rules. You can't ask for informations on any IP adress, this IP adress still as to be suspected on an illegal activity to be concerned by a demand)

No the server don't know the infected person identity (Alice) at all.

As long as the server learns Alice's IP address + access time AND the server knows that the request was made to inform of an infection AND server owner has a way to get information about a a (ip_address, time) tuple (like administrative requisitions) it is enough to deanonymize a user. We don't care about the content of the message, as long we know that it is an "inform" message.

Like I understand the ROBERT protocol, this isn't directly Alice that inform on an infection. This is the so called "trusted server" (the one of the hospital or the medical office where Alice was tested infected) that will do it and send her (mixed) list of contacts to the general back-end server. So the server can't link Alice IP with a request to inform on an infection (he would only have the IP of the trusted server)

And, to be fair, we should agree on this point, indeed you mention this case in your long document but totally discard it from the summary :

I am not an author of the ROBERT protocol or this document at all ;) . I am just a concerned citizen sharing my thoughts on this protocol hopping he would be as secure and respectful of our privacy as possible, like you.

Before being accused of ideology, I would say that proposing a centralized solution first could also be a bias.

Unfortunatelly, a centralized solution is the only way to protect against users knowing the infected status of their contacts. In a decentralized solution this is necessarily the infected user that alert his contacts of his status. So it is quite easy for them to cross this information with the one they have (the date and time when they crossed him) to have a pretty clear idea of its real identity if they know him. To me this is a way bigger problem than the potential of the server owner to get my real identity (and even with that information, he won't do much with it because it would be quite hard for him to infer my social network or my infected status from the informations he has). So assuming that a minimum of technical and legal guarantees are given to me that my privacy will be respected, I am personally much more bound to use a centralized solution than a decentralized one.

superboum commented 4 years ago

Unfortunatelly, a centralized solution is the only way to protect against users knowing the infected status of their contacts. In a decentralized solution this is necessarily the infected user that alert his contacts of his status.

It is definitely possible to build decentralized solutions that do not reveal your identity to people that you may have infected. What you want is sender (and potentially receiver) anonymity. Maybe this introduction could be a good starting point.

But just to share the intuition : Alice send her message to a first server, that will relay it to a second one, that will finally deliver it to Bob. By doing this, Bob does not know who sent the message. The server that delivered the message to Bob does not know who sent the message to. The first server, that Alice contacted does not know to whom the message is sent. So no one knows that Alice and Bob communicated together. This simple example can be modified to provide even more security properties while never requiring a trusted third party.

ramsestom commented 4 years ago

Unfortunatelly, a centralized solution is the only way to protect against users knowing the infected status of their contacts. In a decentralized solution this is necessarily the infected user that alert his contacts of his status.

It is definitely possible to build decentralized solutions that do not reveal your identity to people that you may have infected. What you want is sender (and potentially receiver) anonymity. Maybe this introduction could be a good starting point.

But just to share the intuition : Alice send her message to a first server, that will relay it to a second one, that will finally deliver it to Bob. By doing this, Bob does not know who sent the message. The server that delivered the message to Bob does not know who sent the message to. The first server, that Alice contacted does not know to whom the message is sent. So no one knows that Alice and Bob communicated together. This simple example can be modified to provide even more security properties while never requiring a trusted third party.

The problem is not who sent the message but the kind of messages that is exchanged between users. In a decentralized solution, you will need the infected users to send to all other users the information that they are infected (whether these messages are directly delivered or not to the other users but pass through some onion routes is not the issue here) so that other users can compare the "id tags" (EBIDs) of theses infected users to their contact list to know if they are at risk. You may also have infected users to send to all other users their contact list "id tags" but it would be way more inneficient and wouldn't prevent the problem of other users to easily identify the real identity of infected people (would even probably be easier). Actually, as soon as you directly receive some "id tags" associated to an infected user (be it directly the "id tags" of this infected user or the ones of its contacts), all you have to do is to cross them with your local database of contacts "id tags" to know at what date(s) exactly you met this infected person (and have a pretty clear idea of who he is especially if you contacted it in a sparsely populated environment or more than once). The centralized solution allows you to prevent users from having to contact eachother so their identity can be fully preserved (a user never ask an information about another user like his status or his contacts list. All he can do his request his own status) And once again, even if the centralized server was able to discover the real identity behind every ID (by associating the real identity behind the IP used during the registration of each user), he won't really be able to do anything detrimental to these users privacy as, given how the ROBERT protocol works, it won't have enaugh information to infer who is infected or not and who where their contacts)

Adrien-Luxey commented 4 years ago

Hi, I'm a PhD also working on privacy protocols.

The problem is not who sent the message but the kind of messages that is exchanged between users. In a decentralized solution, you will need the infected users to send to all other users the information that they are infected (whether these messages are directly delivered or not to the other users but pass through some onion routes is not the issue here) so that other users can compare the "id tags" (EBIDs) of theses infected users to their contact list to know if they are at risk. You may also have infected users to send to all other users their contact list "id tags" but it would be way more inneficient and wouldn't prevent the problem of other users to easily identify the real identity of infected people (would even probably be easier). Actually, as soon as you directly receive some "id tags" associated to an infected user (be it directly the "id tags" of this infected user or the ones of its contacts), all you have to do is to cross them with your local database of contacts "id tags" to know at what date(s) exactly you met this infected person (and have a pretty clear idea of who he is especially if you contacted it in a sparsely populated environment or more than once).

You make wrong assumptions here. Two users never have to share their contact lists nor infection status in a decentralised environment, only their personal ID when they meet. Each user would store their contact history. The message Alice (newly infected) would send to Bob (with whom she had contact earlier) would only read "you were in contact with an infected person", and it does not have to be sent right after the contact. Bob does not need to know Alice's ID nor a precise timestamp. Using a sender-anonymous route (e.g. onion route), Bob can't match the message to Alice's network address either.

I guess you can still use a centralized authority (state-owned servers) to make the ID-to-netaddr mapping, and to prevent misbehaving users from forging messages and running havoc on the population. This role would still need to be cautiously specified, as it could very well end up the same way: the state learning Alice's social graph as she informs her contacts of her infection. But in this setup, privacy is at least possible.

Adrien-Luxey commented 4 years ago

And once again, even if the centralized server was able to discover the real identity behind every ID (by associating the real identity behind the IP used during the registration of each user), he won't really be able to do anything detrimental to these users privacy as, given how the ROBERT protocol works, [the centralized server] won't have enaugh information to infer who is infected or not and who where their contacts)

I read the English summary, which makes me disagree. Alice, being infected, is the only one sending her contact list to the server, so, from the server's perspective, it is trivial to learn that it is indeed Alice who is infected -- under the assumption that the server can map pseudonyms to IPs (which is guaranteed) to identity (which depends on your trust of the State).

ReichertL commented 4 years ago

Before being accused of ideology, I would say that proposing a centralized solution first could also be a bias. Let's take Tor example. While on battery, Tor would not run to save battery life. Phones would only periodically generate identifiers that are public/private keys and exchange the public key. We must know that registering an Onion Service (prev. named Hidden Services) over Tor is similar to registering a public key in a DHT. Once on battery, a user's phone would try to access the Onion Services linked to the public keys it has gathered during the day. If I am sick, I simply ask my application to register an Onion Service for every keys that are in the considered time interval....

If you now additionally used blind signatures for verifying infections without leaking unnecessary information to the health authorities, you get a pretty secure system. Our paper uses blind signatures and a DHT for contact tracing.

ramsestom commented 4 years ago

Hi, I'm a PhD also working on privacy protocols.

You make wrong assumptions here. Two users never have to share their contact lists nor infection status in a decentralised environment, only their personal ID when they meet. Each user would store their contact history. The message Alice (newly infected) would send to Bob (with whom she had contact earlier) would only read "you were in contact with an infected person", and it does not have to be sent right after the contact. Bob does not need to know Alice's ID nor a precise timestamp. Using a sender-anonymous route (e.g. onion route), Bob can't match the message to Alice's network address either.

And how would Alice knows how to contact Bob? In other words how would the "you were in contact with an infected person" message from Alice be routed only to her contacts (and not all the other users of the app?). So yes, Alice has to know the identity of her contacts somehow in a decentralized approach wich mean a user has an easy way to trace back who where infected or not in his contacts.

I read the English summary, which makes me disagree. Alice, being infected, is the only one sending her contact list to the server, so, from the server's perspective, it is trivial to learn that it is indeed Alice who is infected -- under the assumption that the server can map pseudonyms to IPs (which is guaranteed) to identity (which depends on your trust of the State).

Among the different solutions envisionned for the upload mechanism described in the paper, one is that the proximityList is uploaded to a trusted server (hospital or health organization) that would thereafter mix this list with others and send it to the back-end server. So no, you can't say that Alice is the only one sending her contact list to the server. That would depend on the solution finally chosen by the author of the ROBERT protocol for the upload mechanism.

Adrien-Luxey commented 4 years ago

And how would Alice knows how to contact Bob? In other words how would the "you were in contact with an infected person" message from Alice be routed only to her contacts (and not all the other users of the app?). So yes, Alice has to know the identity of her contacts somehow in a decentralized approach wich mean a user has an easy way to trace back who where infected or not in his contacts.

I answered this question in the same comment : "I guess you can still use a centralized authority (state-owned servers) to make the ID-to-netaddr mapping", but I very much understand why this is not satisfactory. My solution still publishes the mapping.

If you want something that is sender-receiver anonymous (meaning that, when Alice contacts Bob, both Alice and Bob's addresses remain hidden), do use Tor's onion services as suggested by @superboum here. Let's describe something that would work:

When Bob registers to the service, he creates an onion service. Briefly: he creates an onion route from a random place in the Tor network (identified by a randomstring.onion) to him, that people can use to reach him without disclosing his IP. Whenever he contacts someone (with bluetooth), he provides his onion address along with his pseudonym. If Alice met Bob and is infected, she creates an onion route (anonymous) to randomstring.onion, which in turn sends the memo to Bob. In the process, nobody learnt Alice nor Bob's address, and randomstring.onion is only known to the people that physically met Bob. Tada! Sender-receiver anonymity :)

Also, with this approach, we can't "create our own tor-style network", we have to use the live one. Tor anonymizes connections because the network is not owned by only one entity. Tor is meant to be used by the most people, you wouldn't be a nuisance to their network (besides, we are talking about episodic traffic).

ramsestom commented 4 years ago

And how would Alice knows how to contact Bob? In other words how would the "you were in contact with an infected person" message from Alice be routed only to her contacts (and not all the other users of the app?). So yes, Alice has to know the identity of her contacts somehow in a decentralized approach wich mean a user has an easy way to trace back who where infected or not in his contacts.

I answered this question in the same comment : "I guess you can still use a centralized authority (state-owned servers) to make the ID-to-netaddr mapping", but I very much understand why this is not satisfactory. My solution still publishes the mapping.

Your "solution" is centralized. It won't solve the problem of the server beeing able to link an ID to an IP so I don't see any benefit of it (other than complicating the system without benefit)

When Bob registers to the service, he creates an onion service. Briefly: he creates an onion route from a random place in the Tor network (identified by a randomstring.onion) to him, that people can use to reach him without disclosing his IP. Whenever he contacts someone (with bluetooth), he provides his onion address along with his pseudonym. If Alice met Bob and is infected, she creates an onion route (anonymous) to randomstring.onion, which in turn sends the memo to Bob. In the process, nobody learnt Alice nor Bob's address, and randomstring.onion is only known to the people that physically met Bob. Tada! Sender-receiver anonymity :)

May work if users where servers with a fixed IP address or domain name. But on mobile devices your IP address is frequently changed by the ISPs (or because you switched from LTE to a wifi connection). So when Alice want to reach Bob to alert him she was infected he may never receive the message because his IP has changed (or a node is no longer available in his onion route) or someone else could receive it instead (if the IP where reassigned to another user) and Tada! your tracing app becomes useless. So Bob need to give Alice a fixed identifier that allows to reliably reach him (and potentially identify him at the same time)

Adrien-Luxey commented 4 years ago

Your "solution" is centralized.

The problem is not about centralization, it's about the central server being able to de-anonymize the users. You are right on the second part though: "May work if users where servers with a fixed IP address or domain name". To get the message to Bob regardless of connection changes and unavailability, a central server would certainly help.

Anyway, before we make further proposals, we need to now whether they have a chance of being taken into account, see #40.

PRIVATICS-Inria commented 4 years ago

There are many comments in this thread. Let’s talk about some of them.

1 About sniffing devices: just to clarify, the sniffing devices we are referring to are Bluetooth sniffing devices, and not devices capturing Internet traffic.

2 Regarding governmental surveillance, wide scale physical tracking, re-identification through IP addresses, etc. Yes, we agree there are risks. But as @ftaiani explains:

“[...] a robust solution to detecting potential contamination cases using cell phones might need more than just technical mechanisms, and need to encompass legal and organizational aspects.”

We fully agree. And as said by @ramsestom later on:

“So assuming that a minimum of technical and legal guarantees are given to me that my privacy will be respected, I am personally much more bound to use a centralized solution than a decentralized one.”

This is also our main motivation in proposing ROBERT: keep the most sensitive information in the server rather than broadcasting it in some way to all users. This topic (adversary model) could be discussed for hours, clearly. However, when looking at the “avis CNIL sur le projet d’application mobile StopCovid”, we have the feeling this is a reasonable assumption.

Ayms commented 4 years ago

I read everything again and the specs

The CNIL did not detect the obvious deanonymization means mentioned in this thread (maybe wrongly trusting "Anonymity of users from a central authority" in the paper)

Covid detection requires some efforts regarding privacy? Why not, now this is not serious that whether with ROBERT or DP-3T (I don't get what is decentralized with this one, I prefer the ROBERT/PEPP-PT approach) users can be so trivially deanonymized while the requirements are "privacy-by-design"

Onion routing overloading the mobile app (??), decentralization via RDV points (why not, but not for tomorrow, the central authority is not really the issue here, the issue is the deanonymization), using the Tor network (no, for plenty of reasons)

I have briefly explained in #55 what could be a solution, this is not a remake of the Tor network

If you don't want to address the issue then you must remove the above sentence in the paper and clearly highlight the risks

Now, the BT in background issue with google and apple will not be solved, so the suggestion would be to revert to the more standard geoloc (yes... I can argue), even if anonymity is now supposed to be there, this implies to change the IDs/keys mechanisms so the anonymous user location can't be tracked by the authority, and other things but I don't see other alternative (and probably it's more precise than approximative BT signals)

I am not sure that the app has an interest from a medical standpoint, but I am quite sure that someone will do such app and people will go for it, probably another tracking gafa app if we (France) don't do it