DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

Google will register beacons and can trace contacts even of non-users #222

Open muehlhoff opened 4 years ago

muehlhoff commented 4 years ago

Some Android systems always listen to BTLE beacons and transmit them to a Google server if the user has activated the "Google Location History" function together with either "High accuracy location mode" or "Bluetooth scanning". This is a configuration that is found among a large number of users.

Does this functionality also apply to the BTLE beacons sent out from the Corona app on other devices? This means that all Android phones configured as described above, even those which do not have the Corona App installed, can register the ephIDs of nearby Corona App users and send them to a Google server. This results in considerable data protection risks:

  1. Google records the contact history of even those users who do not use the Corona App themselves (henceforth: 'non-users'). The only requirement is that the user has activated the "Google Location History" function and at the same time switched on the "High accuracy location mode" or "Bluetooth Scanning".

  2. If users publish their ephIDs after testing positive, Google can use these records to determine which Android devices had contact with this infected user. Google is thus able to determine exposure events doe a large number of devices at once, including non-users.

  3. By triangulating the registered BTLE beacons of several Android devices in close proximity, it can be assumed that Google can also map ephIDs to the sending devices, even before a user publishes his/her ephIDs upon testing positive.

  4. Google can combine this information with account information including names, phone number and email address of users and non-users.

  5. Auditing the DP3T system and user devices alone is not enough to mitigate this risk. All Android devices would have to get "audited" because all devices might collect beacons and send them to Google. This also applies to users not updating their Android.

Because the technical infrastructure for this attack already exists and in principle scales to the entirety of all Android users, the risk associated with this attack is very high. In particular, it should be pointed out that this vulnerability also includes collection of health related data of uninvolved parties, that is, users who do not install the Corona app themselves.

pzboyz commented 4 years ago

But Google is not getting the ephID's, they are uploaded to your Government Healthcare Org or similar.

cgawron commented 4 years ago

In DP-3T, the ephIDs are not uploaded to a central agency (that's PEPP-PT ...). An infected user uploads it's private key (which can be used to calculate the ephIDs generated by the infected user). The contact history is kept locally.

hitd010000 commented 4 years ago

"If users publish their ephIDs after testing positive, Google can use these records to determine ..."

A least using DP3-T design 2 there is no way for Google. Contact event is stored as H(EphID | |i) and Cuckoo-filter to be download based on these hashes. Google is unable to break a SHA256 nowadays and for the next decades.

a8x9 commented 4 years ago

@muehlhoff Thank you so much for pointing out this serious threat. On Android, to be able to scan for beacons, the ACCESS_FINE_LOCATION permission is needed. This means that High accuracy has to be enabled.

In addition, as you pointed out, any other Android device with High accuracy turned on, running the DP^3T app or not, will now act as a passive scanner syphoning all nearby EphIDs to Google.

At the risk of sounding like a broken record, I think my proposal in #66 would solve this issue. By making the information published in case of infection unlinkable to the information broadcasted via Bluetooth, this threat can be mitigated.

a8x9 commented 4 years ago

@hitd010000 Google (or any other passive listener) knows EphID and i, they can therefore compute H(EphID || i) and test it against the Cuckoo filter. No breaking of SHA256 is needed.

hitd010000 commented 4 years ago

No breaking of SHA256 is needed.

Right, but this is the same way of breaking MD5,SHA1,SHA256 ... by brute force. It works, sure. But is any sense for Google to know about in year 2200 ?

Because Google is unable to recalculate Hash from Cuckoo-Filter, they have to test each recorded EphID against the filter. Good luck to them !

a8x9 commented 4 years ago

Because Google is unable to recalculate Hash from Cuckoo-Filter, they have to test each recorded EphID against the filter. Good luck to them !

How do you think the app users test if they are infected in design 2? They test all their recorded H(EphIDs || i) against the cuckoo filter. Anyone who recorded a broadcasted EphID can do exactly the same.

To give you a more precise idea, I've done some tests using a python implementation of cuckoo filters, with 32 bits per entry. My test script uses only a single thread and is extremely slow compared to an implementation in a lower level language. I've used the numbers provided in the whitepaper and FAQ for Europe: 30K new cases per day, 5 days of contagion window before upload, and I've chosen a 15 minutes epoch.

Testing all EphID broadcasted in Europe (assuming a 737M population) during a day against the cuckoo filter, takes ~ 18 hours on a 6 years old laptop. This can probably be reduced to a few minutes with an optimized, multi-thread implementation.

Do you still think that Google (or in fact anyone with a Raspberry Pi) does not have the necessary resources to test all recorded EphIDs?

hitd010000 commented 4 years ago

Do you still think that Google (or in fact anyone with a Raspberry Pi) does not have the necessary resources to test all recorded EphIDs?

No, you are right. I've not seen, that Google is able to record a timestamp. Therefore they had to test fewer EphIDs, because contact has to be about 5 days before illness and positive test.

mapsguy commented 4 years ago

[Disclaimer: I work for Google and I am part of the engineering teams for Maps and Location History and created this new verified account to respond here.]

Hi folks - just saw this issue and wanted to clarify a couple of things: Google Location History previously only scanned for specific iBeacon and Eddystone (@google/eddystone) identifiers. Unrelated, we found BLE beacons weren't especially effective for the Google Maps Timeline feature, and hence BLE scans in Location History were disabled in Play Services version 20.09.14.

So, BLE scan data is not part of the current data collected for Location History. Also, note that Location History is a completely separate system, entirely opt in, and users can always edit, delete or turn it off at any time.

hitd010000 commented 4 years ago

system, entirely opt in, and users can always edit, delete or turn it off at any time.

Really ? A lot of apps, like McDonalds, is asking to enable location on each start of the app, if opted out.

mapsguy commented 4 years ago

Sorry, I should have said Google Location History, since that's what I'm referring to.

So, what I am confirming is that Google Location History does not/will not look for any of the BLE beacons that would be exchanged by the DP-3T protocol or by the upcoming Google-Apple Contact Tracing Framework.

muehlhoff commented 4 years ago

Thanks a lot for your helpful clarification @mapsguy. I have two more questions:

  1. When you say Google abandoned BLE scanning for Location History, does this imply that it is also not used for "high accuracy location"?

  2. To which extent does what you say also hold for older versions of Android / Google Play that are still out there, for instance, if customers don't (or can't) regularly update their systems. is there an older version of Google Play Services xx << 20.09.14 that would scan for BLE beacons that are used by DP-3T? So just to have an overview:

Is there such a version xx, and is this overview correct?

muehlhoff commented 4 years ago

Oh, and one more "quick" question to @mapsguy:

Will Google cooperate in an auditing that would allow for an external party to verify these claims and certify that no BLE beacons from the proximity tracing system will be transmitted from the client's device as part of any feature of the Android system or Google Service Framework?

OBIvision commented 4 years ago

The responses from Google in this thread are nowhere near solving the problems.

a) Google already do intensive location tracking for advertisement b) On android, Google will control the keys to generate UUID and the channel. c) Google prevent any secure alternative.

pdehaye commented 4 years ago

@mapsguy can you provide similar transparency on how wifi is currently used for Google Location History? I am asking due to re-identification attacks that are possible through inference given MAC addresses and EphID's (proposed) simultaneous recycling.

pzboyz commented 4 years ago

@muehlhoff

Is there such a version xx, and is this overview correct?

Google Play Service updates in the background and is normally very up to date on all devices.

Spacefish commented 4 years ago

@mapsguy is this only true for "Location History" or is this true for general location API as well? i get that you don´t use BLE Tokens for History, but are they used for location finding like WiFi?

OBIvision commented 4 years ago

[Disclaimer: I work for Google and I am part of the engineering teams for Maps and Location History and created this new verified account to respond here.]

Hi folks - just saw this issue and wanted to clarify a couple of things: Google Location History previously only scanned for specific iBeacon and Eddystone (@google/eddystone) identifiers. Unrelated, we found BLE beacons weren't especially effective for the Google Maps Timeline feature, and hence BLE scans in Location History were disabled in Play Services version 20.09.14.

So, BLE scan data is not part of the current data collected for Location History. Also, note that Location History is a completely separate system, entirely opt in, and users can always edit, delete or turn it off at any time.

Fact is this is virtually impossible for ordinary citizens to prevent Location Tracking as Google do everything to prevent this. Google must at all times be assumed to know exactly which mobile phones are where from intra-phone tracking and therefore also able to correlate BLE UUID from Android Phones to Citizen Identity

Covid19Fighter commented 4 years ago

Hi,

Google is able to decipher the EphIds, because the handling is done by the API and not by the government app. It is closed source, it got some time to get this info, but you can read it here:

google/exposure-notifications-server#367

I opened an issue as well on the German app:

corona-warn-app/cwa-documentation#102

And questioned also the Bluetooth measures, bcause Bluetooth was never good enough for distance measuring at this level. I did myself some tests, you don't need a Bundeswehr show for this.

corona-warn-app/cwa-documentation#103

Summary:

Yes, it seems Google would be able to get the infected devices and Google works with some of the major players of DP3T since at least 2018. I have warned the GDPR authorities that this is a real problem and asked them to stop the German government servers from broadcasting medical data that Google can decipher.

pdehaye commented 3 years ago

Hi all. A bit late to this thread, but just to inform you that some court documents were unsealed today detailing how Bluetooth was used to improve location-based profiling in the Google/Android ecosystem. See https://www.azag.gov/sites/default/files/2021-05/Berlin_Exhibit_202.pdf (for some reason requires me to use a US-based VPN to access, so I take the liberty to attach a copy below) Berlin_Exhibit_202.pdf