DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

How to prevent preexisting Bluetooth tracking #69

Open noci2012 opened 4 years ago

noci2012 commented 4 years ago

Many people have BT turned off to prevent tracking in shopping malls, shops, etc. etc. So the BT requirments thwarts the protection against BT tracking.

The only valid way I see is if there would be a way to send BT packets (they are broadcast) WITHOUT source BT address. (or WW use the same address...).

Although that would mean any BT tracking device needs a small update to also handle the DP-3T protocol enough to at least record EBID's.

dalf commented 4 years ago

Some related publications:

jorants commented 4 years ago

More in general: if any identifying information is leaked while the app would be broadcasting then the whole randomization would be broken. As the above papers describe, MAC address randomization is crucial, and should be repreformed at the same time as a UUID change. Any other information leaking (even from other signals such as wifi) would also be problematic.

bdsl commented 4 years ago

In the context of a government promoting an app like this to deal with the pandemic, it might be feasible to introduce emergency legislation temporarily banning that sort of commercial tracking if it it would encourage more people to use the app.

jorants commented 4 years ago

How would it be possible to enforce this? there is no way to detect this tracking as far as i know. Also, this does not prevent malicious parties from tracking.

pdehaye commented 4 years ago

On top, the legal argument used in the paper is invalidated by this (preexisting) tracking, as i try to explain in #9 (see also #44), so quite a bit of the publicity edifice built around PEPP-PT falls apart. Governments would have to say this collect personal data, conduct a Data Protection Impact Assessment, etc Not a bad thing in itself, it only becomes a bad outcome if it is realized too late, affecting trust, or itself becomes weaponized (see #45)

noci2012 commented 4 years ago

IMHO the only MAC source address to be used for this protocol should be: C0:F1:D1:90:00:00 [ or 00:00:00:00:00:00 ] every system MUST be using the same address. Also the base for the EBID's should changed every x hours. just to prevent longer term tracking in the case infection it doesn't matter a lot if 1 or 10 or 100 ID's need to be published. Any ID's that are from before any possible infection obviously need not to be published.

x should be at least cause a change 3-4 times a day,

jorants commented 4 years ago

I do not think that the mobile operating systems allow for you to set a spoofed mac. You can let the chip operate in a random MAC adress mode, as specified by the BLE standard, but setting it to a specific value probably requires root access or even a firmware update (if possible at all). Furthermore, if many BT devices operate on the same MAC then this could lead to many unforeseen problems, MACs are unique by design.

noci2012 commented 4 years ago

@jorants, MAC's are unique by design FOR IDENTIFICATION purposes. To be able to send answers. IE orthogonal on what D3PT / PEPP-PT try to accomplish... anonymous participation. This proposed protocol doesn't need any answers. So no such requirement to identify exists in this case.

So the hardware reality is killing the basic premise needed to base all this participation tracking on. Better save a lot of money and quit a futile attempt to implement something that can never work.

FrankGrimm commented 4 years ago

Spoofing MACs is usually an OS level task that vendors have been actively working on for years at this point [3, 4]. I believe it's crucial to not look at the issues discussed here in isolation if the concern is re-identification though, especially since that randomization is limited in practice if you want to retain your users ability to still use BT for other purposes. Most of those efforts seem to assume that MAC address randomization is sufficient, while it's more likely to be a layer of obscurity. Even if you accomplish the task of mitigating information disclosure w.r.t. your Bluetooth hardware, a regular smartphone installation usually sends out much more information that can be easily combined by interested actors. Motivated parties could still use other factors, such as WiFi SSID broadcasts to form a clearer picture of devices in the vicinity and tracking them across geolocations [1] (or [2] for the defcon talk on that). It's not too unreasonable to think that maps like [0] will appear for these profiles, or at least held by private parties.

[0] http://wigle.net/ [1] https://www.cnet.com/news/what-a-security-researcher-learned-from-monitoring-traffic-at-defcon/ [2] https://www.defcon.org/html/defcon-27/dc-27-speakers.html#d4rkm4tter [3] https://source.android.com/devices/tech/connect/wifi-mac-randomization [4] https://www.turais.de/mac-address-randomization-on-ios-12/

noci2012 commented 4 years ago

@FrankGrimm : so the requirement would be NO randomisation. one fixed address for all. This issue should be held firm imo. Because OTHERS lack any scrutiny shouldn't mean this product should lack scrutiny. Getting awareness about this might help getting others in line as well, or make people more aware of issues. And you are right all google phones very regularly consult semanticlocation-pa.googleapis.com and provide google with a lot of environmental data to help guess where you are.

My current use state for my phone is: BT is turned OFF - except in the car for handsfree use Wifi-State OFF, except for a few known locatios All network activity runs throug a VPN. Access to sites like semanticlocation-pa.googleapis.com is blocked.

Due to corona delivery of my next phone has been delayed for a few months. That phone won't have iOS/Android anyway.

Even using random addresses should not pose a problem to BLE: https://www.bluetooth.com/blog/bluetooth-technology-protecting-your-privacy/ For Wifi it is a shame the real physical address is used when association is done, it should not be needed.

FroehlichMarcel commented 4 years ago

On existing bluetooth tracking check #43. Transforming a tracked, geo-located and identified device (the average smart phone) into an anonymous tracing node is really difficult. I'd deactivate WLAN and location services, while being outside and in active BT tracing mode. But that is likely not enough. Commercial tracking should be temporarily outlawed during the active contact tracing period, i.e. at least until the end of 2020. I know this may be unrealistic, but it should be the most effective measure to ensure safe contact tracing. Mind, we do not have to rely on technology only. The legislator has the power to change the rules altogether, if there is consensus on what needs to be done. Not considering the existing adtech tracking infrastructure around mobile devices is naive.

noci2012 commented 4 years ago

If technology leaves no trails then any thing else is largely irrelevant. If some trails are left, Legislation(with sufficiently severe penalties) and other measures are needed to resolve what needs to be done with those trails. Problem is listening in on devices is hard to detect when it matters, and there is no opt-out when it is too late.

mrseeker commented 4 years ago

Based on a thought question from a friend of mine, here is a possible real-life example of Bluetooth tracking:

"Bluetooth scanners installed across the study area continuously scan for other devices. When a mobile device is detected, an 'in' registration with its MAC-address and a timestamp gets registered. Then, when this device is not detected anymore for at least 10.24 seconds (the duration of a scan cycle), an 'out' registration with another timestamp gets registered. In this way, it is possible to trace individual mobile devices because it is known where (location of the scanner) and when ('in' until 'out') a certain mobile device (MAC-address) was." Source: Versichele, Mathias & Delafontaine, Matthias & Neutens, Tijs & Van de Weghe, Nico. (2010). Potential and Implications of Bluetooth Proximity- Based Tracking in Moving Object Research. CEUR Workshop Proceedings. 652.

Contract tracing might quickly lead to "device fingerprinting". Since the person in question permits to emit Bluetooth signals for identification, other devices can scan & store these "fingerprints". By using a mesh network of passive listeners, it might be possible to trace a person (if randomisation is not high enough or if the seed is known). Another issue might be "combining" sources together: If the person reveals himself as being "infected", the seed (and their whole history of EphID's) will be made visible to the passive listeners. With a network of passive scanners and camera's, you will be able to trace a person to a specific region or even a particular store. The whitepaper tries to prevent this by claiming that the database stores its information based on "a coarse timestamp". Even then, it will be ineffective against fingerprinting if there are multiple data points that when combined, can uniquely identify a person based on a shared "seed".

lbarman commented 4 years ago

Thank you all for your valid inputs. It is indeed a concern.

The most immediate way to improve the situation is to make sure EphIDs are rotated at the same time as the BLE MAC address; this removes the tracking using 2 desynchronized identifiers (as mentionned in the PETS paper cited by @dalf). If done correctly, this solves this particular problem without the need to spoof the MAC address.

Naturally, the fact that Bluetooth is broadcasting enables some fingerprinting attacks (notably, 2nd paper mentioned by @dalf, but there are many of these). They are not specific to our design, but compared to a situation where BLE would be turn off, our design requires you to turn BLE advertising on, hence facilitating some attacks. This will be the case for all Bluetooth-based contact tracing. In general this is both a valid concern and a hard problem to solve, which depends on many technologies.

Would this answer the question ?

noci2012 commented 4 years ago

IMO, if ALL covid advertisements have the SAME source MAC this is a non issue then all passive tracking wil report onl one MAC address, all occurrences within an area are of different devices. Hence there is anonymity. Anything that has some unique aspects wrt. device will be traceable. Best use 00:00:00:00:00:00 as the senders source MAC addres, otherwise if firmware/hardware replace all zeroes with the real address then use the address: C0:F1:D1:90:00:00.

FroehlichMarcel commented 4 years ago

Please check this thread, too. https://twitter.com/moxie/status/1248707315626201088?s=20

Why should we not talk about the BT ad infrastructure and request legislative actions to enable safer tracing. Disabling it temporarily would be way less harsh than closing shops.

A trap that many engineers fall into, is to believe that every problem must be tackled with technical answer. At the edge between system and environment, there often is not tech solution within the system only.

jorants commented 4 years ago

To add to @mrseeker their remark, imagine a couple: Person A works during the day and hence has to leave the house. Person B stays at home. By leaving an old phone on inside the house A could track whether a BLE signal is present and hence when B leaves the house. Viewed in the light of domestic abuse this can be very worrying.

burdges commented 4 years ago

Could we trigger ephid changes based on changes in the BLE rotating MAC address? If not, then we'll leak (a) that we're running this app, which marks someone, and (b) occasionally link ephids and/or MAC addresses.

Also https://qz.com/1169760/phone-data/

lbarman commented 4 years ago

@burdges It seems that this is the approach that Apple/Google will take in the end, so at least through their API it should be possible.

@jorants absolutely, as was acknowledged in my post! using Bluetooth contact-tracing will force you to use Bluetooth + broadcast (random) identifiers, which can be a privacy risk.

@FroehlichMarcel legislation against device tracking/fingerprinting could be a solution (and hopefully this would not be limited to the duration this pandemic ;) ).

As I see people still have inputs, I will keep this thread open. So far we have listed the following possible mitigations:

pdehaye commented 4 years ago

These matters have an impact on the legal analysis as well.

FroehlichMarcel commented 4 years ago

@pdehaye Right. Saying there is no personal data in this scenario is wrong IMHO.

mrseeker commented 4 years ago

I think a fixed mac address using an iBeacon BTLE frame would certainly help. Basically turn the phone into an anonymous beacon transmitter. Both iPhone and Android support iBeacon, and I think this could also be used for apps as long as they can spoof the MAC address somehow.

burdges commented 4 years ago

You need the ephid to change exactly when the MAC address changes. If hardware does not permit this, then you must rotate the ephid fast enough so that few MAC addresses get linked by ephids.

lbarman commented 4 years ago

Hi all, to consolidate the discussion, the followings inputs were made:

thaidn commented 4 years ago

It seems that some Android manufacturers from China always broadcast the fixed Bluetooth Classic MAC address when Bluetooth is turned on: https://github.com/BluezoneGlobal/react-native-bluetooth-scan/issues/4#issuecomment-620741016.

@lbarman do you think DP3T should recommend implementations checking and turning off discoverability whenever possible?

lbarman commented 4 years ago

Hi @thaidn, thanks for your message!

@lbarman do you think DP3T should recommend implementations checking and turning off discoverability whenever possible?

I agree that discoverability being always on is not something I'd like on my phone :) So in theory, yes, however if the OS itself forces discoverability whenever Bluetooth is on, I don't see how the app could do anything.

thaidn commented 4 years ago

Hi @lbarman, it seems possible, as advertised by this app: https://play.google.com/store/apps/details?id=com.minol.miuibluetoothfix. Contact tracing apps can offer the same functionality.

I'm from Google, working on the CT API. This is not a promise, but we're thinking maybe we can turn discoverability off when our API is enabled.

lbarman commented 4 years ago

hi @thaidn, this would be great ! (I confirmed this with some people internally too) Let me know if you need something from us :)

pdehaye commented 4 years ago

Note: at the bottom of #43, @oseiskar shared a PoC of an attack leveraging existing beacons.

antonioparraga commented 4 years ago

Hi,

I think I have a poor man solution for that: As all of us know, we can't synchronise both the ephID and MAC rotation because, basically, MAC rotation can't be controlled by us from the APP. So, what about changing the protocol so instead of having a given number of daily ephID that are rotating continuously, why not changing that way so we never share the same ephID twice, so it doesn't matter if the MAC is the same or if it has been rotated, because no one will have the same ephID in between a MAC rotation just because we the protocol won't share the same ephID twice. It could be done if we generate a completely different ephID to each client discovering our BTLE service, but it could require some small changes in the way the ephID is generated and shared.

Right now it's possible to build an small APP that trace a given device with DP-3T even if the MAC changes just because the ephID is not being changed at the same time, so this protocol right now "disables" this security mechanism that bluetooth manufacturers provide with their bluetooth chips and expose the phone for a complete traceability, is just like if the MAC never changes, right?

What do you think? does it make sense?

ozppupbg commented 4 years ago

Hello,

I think this issue is related to the discussion here: https://github.com/DP-3T/dp3t-sdk-android/issues/111 As far as I understand, it has not yet been tested on the GAEN implementation however.

romeokienzler commented 4 years ago

@ozppupbg this is true. But I'm happy to do that test. In fact I'm in the process of releasing the system as open source (based on a Raspberry Pi Zero W) so that everybody can do the test themselves. Is the GAEN implementation in a state where it can be tested? Did Google release a beta-version of the API we can use for testing?

noci2012 commented 4 years ago

@romeokienzler GAEN has been released already. (You can find it in Settings below Google settings on an android phone). It has been force fed during the last week. AFAICT from documentation, only whitelisted applications (by signature) can use the API. So this will be very hard to test.

@ozppupbg yes Although EhpID are randomized to time, when someone tests positive they can be matched if one has several recordings. That does depend on popularity of the location where EphID's get recorded and the amount of positive tests is relatively low. So massive recording in locations may produce visits that can be correlated.

antonioparraga commented 4 years ago

In that case my group and me have released recently a new protocol inspired on DP-3T and opened the code on github. It takes into account the bluetooth address swapping to avoid this issue with the EphID. What it does is to generate a completely different key everytime it interchanges the key with another device, so no one has never the same key twice. It bypass the issue with the bluetooth address synchronisation, so now it doesn't matter when the address changes to.

The solution is at https://github.com/open-coronavirus/open-coronavirus

GAEN is really great, but in Spain there is a great number of devices that won't be never compatible with GAEN, so maybe an hybrid solution with GAEN for compatible mobiles and a conventional BTLE protocol is a win, not sure if you know what I mean.

noci2012 commented 4 years ago

That still allows de linking events (with recorded location & timestamps and possibly other means) after keys are released.

romeokienzler commented 4 years ago

so maybe an hybrid solution with GAEN for compatible mobiles and a conventional BTLE protocol is a win, not sure if you know what I mean.

yeah. I was wondering if we can't create wearables e.g. running on an ESP32 or Raspbery Pi Zero W which simulate the GAEN API and basically sending compatible advertisements and also capturing the received ones and use the backend API to do exactly the same the apps are doing...very interesting for elderly people - anybody has a thought on this? Switzerland seems to use the DP3-T backend if I'm not mistaken...

jasisz commented 4 years ago

I was playing with the Google solution with couple of Android phones around and:

antonioparraga commented 4 years ago

In that case our protocol hasn’t this issue because it just generates a different key every time, so it doesn’t matter when the MAC changes or the interval you are close to each other. I don’t understand why GAEN doesn’t do the same thing ...

jasisz commented 4 years ago

One more thing - it seems that this new MAC with old data usually lasts about 3 minutes.

romeokienzler commented 4 years ago

I don’t understand why GAEN doesn’t do the same thing ...

It is really sad that

am I getting something wrong here?

romeokienzler commented 4 years ago

Oh so far so good, tested the official Swiss Covid app today on a Fairphone 3 and they seem to have fixed the issue wrt. device fingerprinting. In other words, even if you know the random mac address, now it is disabled to query the device's offered services by making a connection to it. This is afaict an improvement over the original DP-3T implementation using the original bluetooth stack where this still was possible (as I've reported here DP-3T/dp3t-sdk-android#111)

Here some tracing:

sudo blescan -s -65 Scanning for devices... Device (new): 4b:d1:d2:b3:44:59 (random), -59 dBm (not connectable) Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd7ad0fe6b0d71291b7d416c8a06db339837794d22>

sudo blescan -s -65 Scanning for devices... Device (new): 4b:d1:d2:b3:44:59 (random), -58 dBm (not connectable) Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd7ad0fe6b0d71291b7d416c8a06db339837794d22> Device (new): 49:95:b6:30:79:37 (random), -59 dBm (not connectable) Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd927e08f32f7c7cc8809b27162924891964429c5a>

sudo blescan -s -65 |grep -B 1 -A 1 0000fd6f-0000-1000-8000-00805f9b34fb Device (new): 49:95:b6:30:79:37 (random), -55 dBm (not connectable) Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd927e08f32f7c7cc8809b27162924891964429c5a>

In the second trace for some reason I see my device twice (pretty sure it is my device based on RSSI. But at least in trace 3 MAC and EphID changed.

Then I've asked a friend to install the app on an iPhone and I get this trace then

sudo blescan -s -65 |grep -B 1 -A 1 0000fd6f-0000-1000-8000-00805f9b34fb Device (new): 40:e1:04:e9:e1:73 (random), -53 dBm (not connectable) Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd62d4c0629d393c9d1f9057a43250618f5a48ce9e>

Flags: <1a> Complete 16b Services: <0000fd6f-0000-1000-8000-00805f9b34fb> 16b Service Data: <6ffd66bb45e67b124ce4e5a86716975a5af6a6cd6d16>

So to bad that we have to rely on GAEN's API for key management, otherwise we could randomly change the EphID update interval. Also I have to stress again that key management should ALWAYS be open source. Not happy that we are forced to trust Apple and Google on not sending the key generators to somewhere or accidentally introducing a privacy flaw.

I've raised the issue (for random update intervals of EphIDs) with the Swiss government https://www.melani.admin.ch/melani/en/home/public-security-test/reporting_form.html

They said: Thank you for your report. We thank you for your support and your valuable contribution to the security of the proximity tracing system.