corona-warn-app / cwa-documentation

Project overview, general documentation, and white papers. The CWA development ends on May 31, 2023. You still can warn other users until April 30, 2023. More information:
https://coronawarn.app/en/faq/#ramp_down
Apache License 2.0
3.28k stars 344 forks source link

Possible data privacy violation vector? #273

Closed KaiRoesner closed 4 years ago

KaiRoesner commented 4 years ago

In today's BNN ("Badische Neueste Nachrichten") paper - which is distributed in Karlsruhe city and region - there is an article titled "IT-Experte zu Corona-Warn-App: Die Sicherheit ist lückenhaft". The article in also available online.

In the article an IT expert from KIT lays out a data privacy violation vector which he claims has been hushed up and kept under wraps. The vector is described in more (excessive ;-) detail here.

The gist of the described vector is that another app may listen in on the identifiers exchanged via BLE and record them together with timestamp and location information (the latter being available since the location service is required to be active). If this app is then also able to intercept the download of infection identifiers it may be able to correlate the presence of an infected person with a time and location and thereby narrow down if not identify the infected person.

Has this scenario been looked at? I don't know whether the prerequisites for this to work are attainable but even without that it would be good practive to have the additional line of defense and mitigate this by design, if possible.

daimpi commented 4 years ago

There are more papers out there which discuss this and other possible attack vectors:

Here is a reply to a paper which also discusses this vector: https://threadreaderapp.com/thread/1271881305614057474.html.

I've read them but I'm not an expert on this topic. Imho it comes down to the scalability and risk/reward tradeoff for each of the attacks. As the reply linked above I found a proper analysis of this tradeoff was usually lacking. The papers often just stated that those vectors were “easily exploitable”, which in principle can be fine for an academic paper as it contributes by focusing on a specific aspect of a technology. Even though I though those papers would have profited from stating more explicitly that the relevance of their conclusion depends on such a tradeoff analysis which they seemed to wave away a bit prematurely.

geos-github commented 4 years ago

This is an inherent property of the architecture, not a weakness of the app. If you can detect and record an RKI (which is what the whole system is based on) and if you can attribute that RKI to a specific person (i.e. de-anonymize that RKI, because you are standing next to that person at that moment and know that person or have some means of identifying that person), then, if that person within the next 14 days happens to test positive and submits his or her diagnosis keys for publication you can retrieve those keys and conclude that that very person the RKI of whom you had recorded at a specific time a few days ago has just tested positive. Now compare this to the "traditional" way of local health authorities doing contact tracing. If they happen to identify you as a contact of a positive test case and inform you about that fact chances are you might identify who that positive test case was (I don't know which details are typically disclosed by health authorities in such a situation). Nevertheless, the only way to prevent such potential disclosure would be not to publicly disclose the diagnosis keys (i.e. no "decentralized" approach) but to have some central entity perform the matching, which inevitably would come with a wide range of severe privacy issues.

KaiRoesner commented 4 years ago

Nevertheless, the only way to prevent such potential disclosure would be not to publicly disclose the diagnosis keys (i.e. no "decentralized" approach) but to have some central entity perform the matching, which inevitably would come with a wide range of severe privacy issues.

Well, couldn't you also exchange the identifiers over an encrypted channel to prevent an attacker from listening in?

mh- commented 4 years ago

Well, couldn't you also exchange the identifiers over an encrypted channel to prevent an attacker from listening in?

Well, in all privacy-preserving "Corona Warning" concepts that I could think of, an attacker could always just "be" a legitimate user by whatever means the concept requires (e.g. run an app on a smartphone, simulate the same thing on a cheaper device, etc.), and then be warned, just like all other legitimate users. It's that simple.

KaiRoesner commented 4 years ago

Good point, @mh- - and then we are back to the risk/reward tradeoff analysis that @daimpi mentioned, being pretty meagre for one "attacker" being able to potentially identify/narrow down to a few infected persons that they themselves have met.

kbobrowski commented 4 years ago

We were discussing this some time ago at #76

There are some ways to deal with this, but they all seem to require more centralized solution.

I think that more centralized solution would not be less privacy-preserving in principle - it's possible to simply set up auditing environment and make it accessible for everyone interested in checking if server is not doing anything shady. And especially in Germany I'd really doubt that authorities would be eager to exploit centralized system, perhaps it's more important to have privacy-by-design, decentralized systems in countries with authoritarian governments.

In the end the reasons seem to be partly ideological, as general public might have rejected centralized approach, as explained in protect FAQ:

In April 2020, a scientific debate emerged on the key concept "centralized versus decentralized data storage", which was conducted under the public eye. In addition to plausible arguments in terms of content, a partly ideological debate on centralized storage developed, which had the potential to jeopardize confidence in this technological approach.

But at this point nothing can be done about attack vector being the subject of this issue, Diagnosis Keys are easily available, any app with Location permission can gather Bluetooth data (or any other device capable of this), and cryptographic algorithms in play are well documented. We just need to live with this, but probably it won't be a big issue after all.

KaiRoesner commented 4 years ago

Yes, I just noticed there have also been related discussions at #147 and #223. So I'm closing this.