Remote access is violating integrity, anonymity and data protection

TomTeeJay commented 4 years ago

The Scoping Document (only in german) https://github.com/corona-warn-app/cwa-documentation/blob/master/translations/scoping_document.de.md describes "supporting processes" in #User Story ID E07.01.

The RKI wants to manipulate tresholds directly on the devices. So forth the API is accessing and manipulating locally stored data directly on the device. It is intended to do this without any App Updates (ref. to 2: "Die Anpassung wird auf den Endgeräten vorgenommen, ohne dass ein Update der App erforderlich ist.")

It is not clear when and under what circumstances this will be done. The context of these supporting processes is not described in detail. Whether and how users are informed about this access is also not specified.

User Story E10.01 describes further changes of dynamic contents that are to be controlled centrally. A content management system is therefore updating elements within the App or the App is loading ressources from the web. In both cases this makes everybody identifyable.

Conclusion:

At least some "supportive processes" are violating basic privacy goals of data Integrity and anonymity due to the fact that certain tresholds and dynamic contents can be manipulated remotely.

s-martin commented 4 years ago

The RKI wants to manipulate tresholds directly on the devices. So forth the API is accessing and manipulating locally stored data directly on the device. It is intended to do this without any App Updates (ref. to 2: "Die Anpassung wird auf den Endgeräten vorgenommen, ohne dass ein Update der App erforderlich ist.")

But isn't that changing the parameters of the proximity algorithm and not modifying the data?

And this is not necessarily done by "remote access", the end user device could also pull the algorithm parameters periodically from a server.

tkowark commented 4 years ago

Thanks for pointing out this issue. We will provide more details on that process in upcoming documents.

MalteJ commented 4 years ago

For discussion of implementation details please wait for the release of our architecture documents and the app or backend code.

egandro commented 4 years ago

And this is not necessarily done by "remote access", the end user device could also pull the algorithm parameters periodically from a server.

Which is violating the privacy :) As you gain access to the IP + Timestamp you can easily estimate the location of the user. Having this with 30-50 mio phones - the privacy is gone....

niklas2810 commented 4 years ago

Which is violating the privacy :) As you gain access to the IP + Timestamp you can easily estimate the location of the user. Having this with 30-50 mio phones - the privacy is gone....

...which can be done by any app on your phone. If you are really that paranoid, simply use a VPN on your phone and the problem is gone. There has to be a way to update parameters, and Play Store/App Store updates are just way too slow (and not mandatory).

Edit: And as MalteJ and tkowark mentioned, implementation details can be discussed when the actual architecture is released.

TomTeeJay commented 4 years ago

In my understanding the app must not lazy-load any content or ressources as egandro is correctly pointing out. These ressources de-anonymize users. And in your supportive processes you are even writing of changes to certain tresholds. And what about users using their devices offline without internet-access or behind personal firewalls?

Well with excitement I am looking forward to further code. But with an already broken App by Design and Default no code will make this better.

egandro commented 4 years ago

In my understanding the app must not lazy-load any content or ressources as egandro is correctly pointing out. These ressources de-anonymize users. And in your supportive processes you are even writing of changes to certain tresholds. And what about users using their devices offline without internet-access or behind personal firewalls?

That's the point. 3rd party can easily map IP/Timestamp to IMEI / Address / Clear Names. Having this possibility should be avoided by design!

egandro commented 4 years ago

Edit: And as MalteJ and tkowark mentioned, implementation details can be discussed when the actual architecture is released.

Design flaws should be discussed ahead!

Giving 3rd party the possibility to easily map IPs to Clear Names should be avoided by all means!

MalteJ commented 4 years ago

@egandro I don't see where we give third parties the possibility to map IPs to clear names!?

TomTeeJay commented 4 years ago

@MalteJ : Well obviously one of us is missing something when your design document says, I quote:

Als RKI möchte ich die Inhalte der Applikation zentral verwalten, um Aktualisierungen von Texten, Links, Hotlines, etc. einmalig für alle Stellen in der App durchführen zu können.

Der Content wird auf statische und dynamische Inhalte entsprechend der technischen Machbarkeit differenziert.

Aktualisierungen erfolgen in der ersten Version über ein App-Update.

There is no english translation of this yet, but this describes centralized tracking violating any descentralized approach.

egandro commented 4 years ago

@egandro I don't see where we give third parties the possibility to map IPs to clear names!?

Really :(((

$ wget https://your-rki-server.de/settings.json

The server will get the IP of the devices. It creates a log file with a timestamp. So 50 mio users must trust YOU that 3rd party doesn't get access to this list of IPs...

Which they shouldn't!

Fun fact: I hope you don't use AWS / Azure Servers...

Find a better concept where this isn't possible by design! P2P, QR Code, ... there are so many many many posibilities.

MalteJ commented 4 years ago

@TomTeeJay it looks like you are not familiar with the "decentralized contact tracing architecture". Until we have released our design docs, you may find more information in DP3T's documents https://github.com/DP-3T/documents

and from Apple: https://www.apple.com/covid19/contacttracing/

MalteJ commented 4 years ago

@egandro

The server will get the IP of the devices. It creates a log file with a timestamp. So 50 mio users must trust YOU that 3rd party doesn't get access to this list of IPs...

True. But there are no clear names. Inherent to the internet design is the communication using IP addresses (pseudonyms). But we do not have any clear names and cannot provide any clear names to third parties.

MalteJ commented 4 years ago

By the way we will not persist the IP addresses. They are used temporarily for the time the connection is established.

egandro commented 4 years ago

@egandro True. But there are no clear names. Inherent to the internet design is the communication using IP addresses (pseudonyms). But we do not have any clear names and cannot provide any clear names to third parties.

3rd party can do this :) Dial 112 and let them explain how easy a phone IP can be mapped to a name.

That's why nobody should trust this.

egandro commented 4 years ago

By the way we will not persist the IP addresses. They are used temporarily for the time the connection is established.

WHY should we trust this?

Get a better concept! Use P2P, use a QR-Code, use a Setting ID, ... there are so many many possibilities to solve this.

niklas2810 commented 4 years ago

@egandro

WHY should we trust this?

You can never be sure whether the server provider uses the data responsibly. However, there is not much data collected (except for the IP address and current time, which is stored for every single request you make on the web, so I don't really get your point why this is such a critical issue on this particular project). The same issue could come up when submitting a positive test result (or similar cases where you have to communicate with an external server).

Get a better concept! Use P2P, use a QR-Code, use a Setting ID, ... there are so many many possibilities to solve this.

To my mind (!), fetching the updated settings from the server is the best approach. P2P can be really slow and will not affect people who don't have contact to others (these devices will update when they make contact again, which may be too late). Additionally, this would be way harder (and more time-consuming) to implement, could create new attack vectors (e.g. malformed data could corrupt the app, you could manipulate other user's settings, etc.) and would cause much unnecessary data transfer (as all devices would have to constantly compare their settings version). QR-Codes require manual user interaction, which will lead to many outdated client configurations.

TomTeeJay commented 4 years ago

Well. we are still talking about an Health App, aren't we?

If your App is using any dynamic content bypassing common update-mechanisms, then it's simply not considered as safe.

They are used temporarily for the time the connection is established.

Well even if this may be, or not, nobody knows. However this is not valid for anybody inbetween. The IP of a user already is an personal datum making him or her identifiyable. For instance as admin of a corporate network with hundereds of users I only need to wait till my users transfering a static ressource "I_am_infected.html" or a certain .json strings through my proxies to know exactly who is infected.

QuadratClown commented 4 years ago

WHY should we trust this?

At some point you have to trust a server you connect to, and i mean any server. Especially if you worry about IP addresses. P2P or QR-Codes are no solution if you dont want super bad fragmentation of update status (either by requiring manual entry or a inherently slow distribution protocol). So at that point its either "app is deployed completly finished and immutable" or bust. I don't think that'll work.

egandro commented 4 years ago

To my mind (!), fetching the updated settings from the server is the best approach. P2P can be really slow and will not affect people who don't have contact to others (these devices will update

I am no SAP / Telekom architect :) I don't do their job. I would avoid web calls to any Server by all means!

The whole point of #13 is, there is a privacy issue. I tried to explain how this is a problem. Collecting IPs is always an issues - weather you are allowed or not - this will be done by 3rd parties.

Here a better approach. Add a "RKI Mirror Server URL" - e.g. as Linux Distributions do it with their packet management. Put a GPG signature and let users create their own mirrors. Nobody can collect their device IPs.

In here you can have settings.json and warning.json.

People who trust the RKI can use this server.

Ryuno-Ki commented 4 years ago

GPG signature is a good point. What about the server, from which data gets loaded, gets hacked? Is there a way to detect this on client side?

I recall some Supply Chain Attack with Ukraine tax service servers or something like that …

egandro commented 4 years ago

GPG signature is a good point. What about the server, from which data gets loaded, gets hacked? Is there a way to detect this on client side?

I recall some Supply Chain Attack with Ukraine tax service servers or something like that …

This works for 20 years in Debian :) We have happy consistent mirrors.

Just using this infrastructure - instead of a REST based server can speed up the projects for weeks!

niklas2810 commented 4 years ago

Here a better approach. Add a "RKI Mirror Server URL" - e.g. as Linux Distributions do it with their packet management. Put a GPG signature and let users create their own mirrors. Nobody can collect their device IPs.

I understand which point you are trying to make and I like this principle as well. But this can and will not work for the coronavirus tracing app. Why? An example: The settings.json file defines the minimum dinstance for a contact to be stored on the device. If there are 50 different mirrors out there, some of them may (let's be realistic: will) have different configurations. But that's not how the coronavirus works, we need common parameters to trace the infection chains in the most efficient way. And I don't want to start speaking about what happens when one of those mirror servers gets hacked, then I prefer one, but very secure, server.

For this settings.json data, this might include (which we don't know yet, because there is still a lot of work for the SAP employees to do! But I guess they will release it on GitHub, so we can have a look at it):

Text, links (which can be manipulated then and may contain harmful content)
The tracing parameters (which need to be commonly adjusted based on the latest scientfic evidence)

Additionally, the tracing keys of infected citizens will be stored on a central server as well (AFAIK), so I don't get the point why this should be an issue for the app configuration.

Edit: Most of the issues mentioned above be solved by verifying the central GPG signature. However, the implementation of such a system would take months and the app should be operational as fast as possible.

egandro commented 4 years ago

I understand which point you are trying to make and I like this principle as well. But this can and will not work for the coronavirus tracing app. Why? An example: The settings.json file defines the minimum dinstance for a contact to be stored on the device. If there are 50 different mirrors out there, some of them may (let's be realistic: will) have different configurations. But that's not how the

There are 51,000 Debian packages for 10+ Archs / CPUs.

So it can handle this :)

Edit:


settings.json {
   ttl: "2020-06-21T18:25:43-05:00",
  ...
}

Very very simple. If it's too old, your app gracefully tells you to update.

As debian mirrors already provide - it has a "last updated" timestamp.

So everything is solved!

egandro commented 4 years ago

Edit: Most of the issues mentioned above be solved by verifying the central GPG signature. However, the implementation of such a system would take months and the app should be operational as fast as possible.

SAP uses Sonatype Nexus anyway :) So a debian repository can be created with 2 clicks. It can be mirrored to a https server :)

GPG + Signature is included with basic CLI tools. A Debian client including GPG is available for all platforms.

I see no point in letting them collect IPs and then "fixing this because there is no time" later... NO! Won't happen.

ghost commented 4 years ago

I just want to mention that side-loading dynamic content would also collide with issue #14 as there would be no way to verify that, for example, links in the app are not manipulated or contain tracking id's.

CodeExplorer22 commented 4 years ago

@egandro I don't see where we give third parties the possibility to map IPs to clear names!?

It does via Bluetooth exploits. Try to fix that on old phones. Good luck. ;) Bluetooth was never secure and intendet to be used for something like tracking.

egandro commented 4 years ago

It does via Bluetooth exploits. Try to fix that on old phones. Good luck. ;) Bluetooth was never secure and intendet to be used for something like tracking.

This is all about a dedicated API.

So 3rd party might have direct access to the IP - which - needs be avoided by design.

The API for changing settings can be implemented without forcing users to trust a server.

TomTeeJay commented 4 years ago

In addition to the fact that the tracing app uses remote updates and dynamic content the app is not suitable for certain user groups with mobile phone and limited data traffic rates.

MalteJ commented 4 years ago

I am not sure about the legislation in regards to "Netzneutralität", but I could imagine that the mobile providers support this application by not billing traffic to the Corona-Warn-App servers.

egandro commented 4 years ago

I am not sure about the legislation in regards to "Netzneutralität", but I could imagine that the mobile providers support this application by not billing traffic to the Corona-Warn-App servers.

Please don't mock us :(

We really really want to help! The Debian-ish approach will cost you less development time and makes the app trustworthy. Of course your SAP / Telekom Geniuse Architects might have better ideas!

A REST call to a AWS / Amazon / RKI server will be a show blocker. You loose trust here!

There are so many many issues to solve - I didn't even find the Headlines in your documents, yet!

ad2003 commented 4 years ago

@egandro

The server will get the IP of the devices. It creates a log file with a timestamp. So 50 mio users must trust YOU that 3rd party doesn't get access to this list of IPs...

True. But there are no clear names. Inherent to the internet design is the communication using IP addresses (pseudonyms). But we do not have any clear names and cannot provide any clear names to third parties.

When this app sees the users IP address just for one ms, it is not acceptable nor secure to use. Also other people around you would be tracked and so their IP addresses.

Right now, it would be the bundestrojaner2go.

I am glad you put it online, to have a discussion about it and that there is the open source approach, but seriously - this app should not be created and let into the wild without being secure.

Here is just a little thought play: Let's assume, the app works. People are using it, they are traced and tracked - call it what you want. Now, one day, a person is infected and the app alert works - everyone else who was near this person's phone is now also seen a "threat" to health and society. Now you would only need a judge to decide, that this is dangerous for the society, and that there is imminent danger and in no time all IP addresses could be tracked back to the users to save society from those people - in the name of health. And then? People would be put in quarantine?

egandro commented 4 years ago

Let's assume, the app works. People are using it, they are traced and tracked -

Let's all be nice!

Please give us technical (not political) solutions here.

MalteJ commented 4 years ago

@egandro

I am not sure about the legislation in regards to "Netzneutralität", but I could imagine that the mobile providers support this application by not billing traffic to the Corona-Warn-App servers.

Please don't mock us :(

That was a response to the doubts in the post before that the app could consume too much traffic. So no mocking intended :)

s-martin commented 4 years ago

Right now, it would be the bundestrojaner2go.

That's a wild exaggeration.

It's an open source app, based on an open source protocol (please see https://github.com/DP-3T - many issues already discussed there) and when the code will be available here, everyone can review and improve it.

egandro commented 4 years ago

It's an open source app, based on an open source protocol (please see https://github.com/DP-3T - many issues already discussed there) and when the code will be available here, everyone can review and improve it.

True! But why should we trust a server that is collecting an IP + Timestamp?

3rd Party can easy map this to an address in realtime :) That is the issue about remote access.

MaxFichtelmann commented 4 years ago

@egandro if I understand correctly, the issue would be addressed, if the user could configure an alternative url as the settings source - from a mirror the user trusts.

A checksum derived from the currently used settings could be displayed to enable the user to verify that the settings are up-to-date.

Is there intention to push different configurations to different users (i.e. based on there location, due to differences in infection activity)?

jeffreygroneberg commented 4 years ago

Might be worth to have a look at: https://github.com/iCepa/Tor.framework

Tunneling http through the TOR network to avoid the real IP addresses of users.

MalteJ commented 4 years ago

I am not sure if we want to add 40M users to TOR using mobile devices. The mobile providers would kill us and we would probably kill the TOR network :D

jeffreygroneberg commented 4 years ago

How often do you want to pull data from the server? If its just a limited time on a frequent base there wont happen that much. Get a connection. Http call. Off we are. ;)

seboslaw commented 4 years ago

@jeffreygroneberg the problem is that once you "get the connection" and do the http call you've already setup the entire TOR stack, which doesn't come for free and will by itself put load on TOR. Plus, setting up the stack takes time (usually ~5-10 secs if there's no load) and the data transmission is slooooow :) But maybe you could add it to the app as an onboarding option for people that desperately want it in a future release.

egandro commented 4 years ago

@egandro if I understand correctly, the issue would be addressed, if the user could configure an alternative url as the settings source - from a mirror the user trusts.

Yes! This works with Debian! 100% decentralized infrastructure.

In theory you can have even your FritzBox as mirror using a tiny App. With decentralized signed RKI files. It's very simple to setup.

I might consider doing a POC as soon as we have the source.

egandro commented 4 years ago

@jeffreygroneberg the problem is that once you "get the connection" and do the http call you've already setup the entire TOR stack, which doesn't come for free and will by itself put load on TOR.

TOR or what is better then nothing - add proxy support! Then we can fake our IPs

This is a 90 minute job for an IT student in 2nd semester.

atdotde commented 4 years ago

It seems to me the suggested problem is that a central instance learns the IP addresses of devices that have the app installed. This sounds practically unavoidable to me as long as you also need to communicate (even by polling) which IDs should be considered infected, any TCP connection has this feature. I don’t think this is a problem as it does not contain any health or contact information which would be problematic.

egandro commented 4 years ago

It seems to me the suggested problem is that a central instance learns the IP addresses of devices that have the app installed. This sounds practically unavoidable to me as long as you also need to communicate (even by polling) which IDs should be considered infected,

NO :) Check this - http://ftp2.de.debian.org/debian/

This is the Debian mirror system. They use this for > 20 years now. We have a GPG signed file.

We could have a warningid.json file with a timestamp + list of IDs that are infected. This will be distributed to a mirror network. There is no need to ever do any communication to a server.

lukasmasuch commented 4 years ago

DP-3T also has a statement on this topic (TOR, IP anonymization) in their FAQ. They evaluated options for anonymous communication systems, but decided against it for their current version. There are also a few related issues on that topic in the DP-3T repository.

MaxFichtelmann commented 4 years ago

There is no need to ever do any communication to a server.

You would still need to transmit the TAN in case of a verified infection.

egandro commented 4 years ago

DP-3T repository.

This document is soooo funny :)

"In future versions of the app, if an approppriate anonymous communication network appears, we may include the option of submitting data anonymously to the backend."

... the option ... :) facepalm

egandro commented 4 years ago

There is no need to ever do any communication to a server.

You would still need to transmit the TAN in case of a verified infection.

I missed the word "central" server. Sorry. You can easily transfer the list of infected IDs by a decentralized server.... and I put in my app the mirror url https://mycastle/ ....

jeffreygroneberg commented 4 years ago

DP-3T also has a statement on this topic (TOR, IP anonymization) in their FAQ. They evaluated options for anonymous communication systems, but decided against it for their current version. There are also a few related issues on that topic in the DP-3T repository.

Thank you @LukasMasuch for the link!

corona-warn-app / cwa-documentation

Remote access is violating integrity, anonymity and data protection #13

Conclusion: