DP-3T / documents

Decentralized Privacy-Preserving Proximity Tracing -- Documents
2.25k stars 180 forks source link

Are you reinventing a secure communication protocol? #162

Closed adewes closed 4 years ago

adewes commented 4 years ago

After thinking about the contact tracing problem for a while it seems to me that it is not very different from the problem of secure, anonymous communication:

There are many existing approaches/protocols for decentranlized end-to-end encryption that provide secure authentication, advanced trust models, end-to-end encryption with perfect forward secrecy and robustness against MITM attacks.

Reading through the issues and the whitepaper it seems you're in the process of re-inventing a secure, end-to-end encrypted communication protocol. Would it make sense to rely on an existing secure communication protocol instead and build on top of that? I think a secure protocol for contact tracing could be built upon an existing secure communication protocol by simply extending it with an anonymous identity concept (i.e. presenting a different identity to every new communication partner to prevent linking of identities).

Using such a protocol contacts could automatically establish a secure communication channel over an insecure Bluetooth connection via a suitable key exchange mechanism. This end-to-end encrypted and authenticated channel could then be used at a later point in time to exchange information about the infection status. How this information is structured (e.g. as a cryptographically authenticated message from a public authority) is independent of the underlying communication protocol and can be implemented on a higher level.

The advantage would be that existing security mechanisms implemented in end-to-end encrypted communication platforms can be used to ensure desirable properties like authenticity, perfect forward secrecy and secure key exchange over an untrusted channel. Also, since the secure communication channel can transport arbitrary information, the system would provide much greater flexibility for implementing e.g. country-specific requirements. Finally, existing client/server functionality and open-source software could be used and modified to implement a contact tracing system. For example, the Matrix protocol provides open-source clients for most mobile platforms as well as open-source servers (https://github.com/matrix-org/) that can be deployed in a federated fashion and have well-audited and working code bases. A contact tracing app could probably be implemented on top of these existing apps with slight modifications.

The generation of statistical information for epidemiologic use is not covered by this but can be implemented independently. I think a separate implementation is advisable in general as the two data uses are not related and, in my opinion, should not be based on the same mechanism as currently proposed, as this will often lead to compromises in terms of privacy and security. I would therefore propose to use an independent, randomization based mechanism to release statistical data to the research community, as this will provide stronger anonymity and plausible deniability for individuals, whereas in the current scheme the anonymity is still conditional on the secrecy of the individuals' secret keys, which according to my interpretation of GDPR does not necessarily qualify as anonymous as a re-identification of individuals is possible if the secret key gets exposed. Also, even when assuming the secrecy of the secret key the current scheme will probably not provide strongly anonymous data, as social/interaction graph data is extremely high-dimensional and multiple approaches have been published to de-anonymize individuals in such graphs. I would therefore strongly advise to publish only low-dimensional data (i.e. answers to specific queries that consist of at most a few numerical or binary values) and rely not only on de-identification/pseudonymization but instead use a modern randomization approach to create plausible deniability at the level of a single individual. Based on the number of individuals that will use such a system it should still be possible to gain relevant and reliable statitistical information using such a randomized mechanism. How the mechanism can be implemented depends on the exact queries.

Looking forward to your feedback on this, happy to discuss!

gardners commented 4 years ago

What is being proposed is less than a full secure communications protocol in the normal sense, for several reasons:

  1. It doesn't need all the capabilities and modes of a normal protocol. In fact, to protect privacy it doesn't WANT such capabilities.
  2. It is bandwidth optimised in a way that a general protocol is not, and designed to scale and be stateless to support this.

In otherwords, it is a protocol for the sharing and digesting of purposely limited information.

galadran commented 4 years ago

Hi Adwes,

I agree there are similarities with anonymous messaging systems but there are also important additional constraints. In particular, contact tracing only involves sending a single message, with fixed content and only if a third party (the medical authority) authorises the transmission.

With those restrictions in mind, you could argue that the DP-3T design is a form of anonymous messaging, optimised for scalability and security. However, retrofitting existing secure messengers such as Matrix to support anonymity (as well as the constraints mentioned above), is more work than you might think. Additionally the security properties Matrix provides (PFS, PCS) are not particularly useful in this setting (one time fixed messages).

You might find this white paper from another project of interest, it presents an alternative design which uses an existing anonymous messaging systems for contact tracing.

gardners commented 4 years ago

In fact, what DP-3T is trying to do, is to create a shared eventually-consistent(ish) database, where participants willingly contribute to the database, in a manner that seeks to preserve their privacy. This is also why it can with a little care, be adapted to mesh network applications, like for disaster zones or in refugee camps, e.g., as described in #126

adewes commented 4 years ago

Thanks I understand that. I would actually see authorization as a different problem as well, and would argue that there are also standardized and secure mechanisms available for it 😄.

I think it's of course fine to invent a new protocol for this and I'm sure you have the necessary expertise to make it secure and scalable. Personally I just try to follow the rule "don't roll your own crypto" whenever possible, as getting a new cryptographic protocol right on the first trial and under time pressure can be quite challenging (though again I'm sure you're up to the task). I see contact tracing as an application that requires three building blocks:

Personally I think it would make more sense to disentangle these three blocks as much as possible, which the current protocol does not seem to do as e.g. the statistical information provided to health authorities is directly coupled to user FKs and the messaging protocol implements its own authorization methods it seems. Again, I'm sure you know what you're doing but to me it seems the design goes against most principles of privacy engineering that I know (keep components small and interchangeabe, define clear system boundaries, separate mechanisms by data use, design for adaptability and resilience).

I wonder e.g. how the system will deal with a compromised signing key (I assume a diagnosis would get authorized using cryptographic signing)? How do trusted certificates get distributed? It seems you would have to re-implement certificate revokation, trust chains and signing mechanisms within your protocol as well (at least in my limited understanding). What about replay attacks? Currently the messages sent by individual phones are not authenticated (as far as I understand) and coud be replayed by adversaries. What if you discover that you might after all need to transmit additional information via the system (e.g. to share more details about an infection)? In my understanding this would require a new revision of the protocol, which would force all implementors to rewrite their applications as well.

Again, you're the experts and you probably know how to do this best, from my experience I just think that a system which is built on top of proven, existing technologies with the least amount of modification has the highest chance of viability. Just my 2c.

gardners commented 4 years ago

There is a collory to "don't roll your own crypto", which is "don't use a crypto system for a purpose for which it wasn't designed, or for which the design assumptions/threat models don't hold". It is this that is the problem here: Existing crypto systems don't address the particular issues that broad contact tracing introduces.

On Tue, 14 Apr 2020 at 07:59, Andreas Dewes notifications@github.com wrote:

Thanks I understand that. I would actually see authorization as a different problem as well, and would argue that there are also standardized and secure mechanisms available for it 😄.

I think it's of course fine to invent a new protocol for this and I'm sure you have the necessary expertise to make it secure and scalable. Personally I just try to follow the rule "don't roll your own crypto" whenever possible, as getting a new cryptographic protocol right on the first trial and under time pressure can be quite challenging (though again I'm sure you're up to the task). I see contact tracing as an application that requires three building blocks:

  • A way to create a contact network and securely communicate with those contacts using a predefined set of messages.
  • A way to authorize messages (e.g. positive test results) to prevent fraud / abuse.
  • A way to generate meaningful and accurate statistical information for health authorities.

Personally I think it would make more sense to disentangle these three blocks as much as possible, which the current protocol does not seem to do as e.g. the statistical information provided to health authorities is directly coupled to user FKs and the messaging protocol implements its own authorization methods it seems. Again, I'm sure you know what you're doing but to me it seems the design goes against most principles of privacy engineering that I know (keep components small and interchangeabe, define clear system boundaries, separate mechanisms by data use, design for adaptability and resilience).

I wonder e.g. how the system will deal with a compromised signing key (I assume a diagnosis would get authorized using cryptographic signing)? How do trusted certificates get distributed? It seems you would have to re-implement certificate revokation, trust chains and signing mechanisms within your protocol as well (at least in my limited understanding). What about replay attacks? Currently the messages sent by individual phones are not authenticated (as far as I understand) and coud be replayed by adversaries. What if you discover that you might after all need to transmit additional information via the system (e.g. to share more details about an infection)? In my understanding this would require a new revision of the protocol, which would force all implementors to rewrite their applications as well.

Again, you're the experts and you probably know how to do this best, from my experience I just think that a system which is built on top of proven, existing technologies with the least amount of modification has the highest chance of viability. Just my 2c.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/DP-3T/documents/issues/162#issuecomment-613128332, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFCOT4RNU2RZAKO5LMXOO3RMOG3HANCNFSM4MHFLBWQ .

jaromil commented 4 years ago

There is no session layer: DP-3T is up to ISO/OSI layer 4 I'd say.

DaveMath commented 4 years ago

You don’t want to send payloads over BLE. Just collect and beacon your randomized ID.

You also won’t need to have contact lists but need to know that 14 day window when you were last near a non-contact.

Taking a poll of these nearby nodes and using SSL is fine to send a payload of who is nearby on a periodic basis to a server, which only needs to know timestap of first seen and last seen.

My company NewAer has a BLE and WiFi scanner with a UDID device hash that renders the mobile phone MACs pseudo anonymous. We have been building this since 2009 and have been focusing on automated Kiosks for travel running on iOS.

Our SDK for Android needs to be updated for their later Bluetooth capabilities. IOS is robust as the hardware is more standardized.

There is a scanner that collects other pseudo anonymous IDs and keeps a local table or could push to the cloud.

Once it goes to that database which it can stitch together the signals of other devices no longer present and compare them to other devices that are unknown. This data can expire users out in 15 day intervals.

Also, it can rotate your ID out in other intervals, but you’d need a history of that old ID compared with other nearby IDs at 15 day intervals.

If you later tell the system that “I have been tested to be infected” then your device would push that ID to the server server, which could the. do a push down to those other former proximal devices carrying the same client hash.

If you delete the app and reinstall, or throw away the phone the hash is recreated and the proximity data is restarted.

Github isn’t resolving their website now so I can’t post our repos. You can download the SDK and read the docs at ProximityPlatform com or get the iPad Compiled demo at NewAer com/Kiosk

We are looking to find groups for collaboration and we will offer our SDK free for non-commercial and non-tracking or advertising use.

Dave Mathews - Dave at NewAer

On Mon, Apr 13, 2020 at 2:10 PM Andreas Dewes notifications@github.com wrote:

After thinking about the contact tracing problem for a while it seems to me that it is not very different from the problem of secure, anonymous communication:

  • As an individual, I add other individuals that are in close proximity to me to my contact list.
  • If I get infected, I want to communicate this to my contact list.
  • I don't want the government or any third party to be able to read my communication with my contacts.
  • I want to remain as anonymous as possible to my contacts.

There are many existing approaches/protocols for decentranlized end-to-end encryption that provide secure authentication, advanced trust models, end-to-end encryption with perfect forward secrecy and robustness against MITM attacks.

Reading through the issues and the whitepaper it seems you're in the process of re-inventing a secure, end-to-end encrypted communication protocol. Would it make sense to rely on an existing secure communication protocol instead and build on top of that? I think a secure protocol for contact tracing could be built upon an existing secure communication protocol by simply extending it with an anonymous identity concept (i.e. presenting a different identity to every new communication partner to prevent linking of identities).

Using such a protocol contacts could automatically establish a secure communication channel over an insecure Bluetooth connection via a suitable key exchange mechanism. This end-to-end encrypted and authenticated channel could then be used at a later point in time to exchange information about the infection status. How this information is structured (e.g. as a cryptographically authenticated message from a public authority) is independent of the underlying communication protocol and can be implemented on a higher level.

The advantage would be that existing security mechanisms implemented in end-to-end encrypted communication platforms can be used to ensure desirable properties like authenticity, perfect forward secrecy and secure key exchange over an untrusted channel. Also, since the secure communication channel can transport arbitrary information, the system would provide much greater flexibility for implementing e.g. country-specific requirements. Finally, existing client/server functionality and open-source software could be used and modified to implement a contact tracing system. For example, the Matrix protocol provides open-source clients for most mobile platforms as well as open-source servers ( https://github.com/matrix-org/) that can be deployed in a federated fashion and have well-audited and working code bases. A contact tracing app could probably be implemented on top of these existing apps with slight modifications.

The generation of statistical information about infection risk for epidemiologic use is not covered by this but can be implemented independently. I think is advisable in general as the two data uses are not related and, in my opinion, should not be based on the same mechanism as currently proposed, as this will often lead to compromises in terms of privacy and security. I would therefore propose to use an independent, randomization based mechanism to release statistical data to the research community, as this will provide stronger anonymity and plausible deniability for individuals, whereas in the current scheme the anonymity is still conditional on the secrecy of the individuals' secret keys, which according to my interpretation of GDPR does not necessarily qualify as anonymous as a re-identification of individuals is possible if the secret key gets exposed.

Looking forward to your feedback on this, happy to discuss!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/DP-3T/documents/issues/162, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACDRZCPYDKC7A5USFBRGJXTRMNPQTANCNFSM4MHFLBWQ .

adewes commented 4 years ago

Thanks for the feedback @gardners, I guess I just see this differently. It's of course possible to see contact tracing as a entirely new type of problem that cannot be solved with any of the existing communication protocols, personally I think it's very similar e.g. to many decentralized or collaborative IoT / mesh applications, so I don't think an entirely new protocol stack is necessary. I understand that from a research standpoint it's probably more interesting and rewarding to come up with a new protocol, it just seems the timeframe is rather short to get such a protocol entirely right before it gets deployed to potentially hundreds of millions of devices.

As it seems you're determined to go with a custom protocol feel free to close this issue, I'm happy to discuss nevertheless and thanks for providing your opinion on the matter.

cluck commented 4 years ago

The Internet Protocol works by sending packets representing information over an indefinite path while openly declaring its source and target location. So, everybody can hijack the packet, get to know the existence of information at a certain point in time and learn the location related to it, without disrupting the communication.

But to represent and leak proximity information indirectly as delta of points in time and space is exactly what DP-3T tries to avoid in the first place. It tries to forget time and location information as soon as it occurs, and relying on a protocol which is physically bound to a local loop/path.

This pretty much categorically excludes any protocol stacked on top of the Internet.

kennypaterson commented 4 years ago

This has been a very interesting discussion, but I think the point made by the original poster has been throughly explored now - our requirements are different from those of a secure, anonymous communications system, and we are forced by necessity to use only lightweight, low bandwidth cryptographic techniques which precludes many interesting features that a fully-developed secure, anonymous communications system would enjoy.