Open carrollgt91 opened 4 years ago
I would suggest we keep everything encrypted. What this will ensure:
Let's put in place the facility for a user to generate a public/private key pair. Specifically let's use BIP-39. This will allow to generate a random seed phrase for the user from which we can combine with BIP-32 to generate the master pub/private key (m/0'/0'/0'):
The user will encrypt the data client side before sending it to us.
If we integrate directly with a medical records provider, we can send the data to the user for encryption before storing it.
From here we can either:
Provide the data (after user's consent) encrypted and data consumers (using OpenMined infrastructure) can learn on it.
Provide a way for the data consumer party decrypt the data the following way: When a data consumer comes in the user can generate then next pub/priv key pair (m/0'/0'/1'), encrypt the data locally, send the newly encrypted data to us and send the private public/private key to the data consumer (on a url which we have verified during data consumer onboarding). From there the data consumer can download and decrypt.
Consequence:
@PlamenHristov although I thought about a different approach your approach with encryption also sounds good! :) Picking up on your first option in the section "Data consumer integration" I thought about using the OpenMined PySyft and PyGrid libraries (Including for encryption, etc.)
As discussed on the slack channel also with @carrollgt91 I understood the goal of this team (this repo) to build a server which should be the middle-man between the sensitive data of the user who wants to be automatically authenticated and some data-consumer who wants an authentication or some data (e.g.: another app which wants to train on some sensitive data) So based on the blog entry I imagined the SSO credentials not to be stored on the PIS bur rather being part of the client-sided data-scraping directly on the user device (also leveraging the possibility that the user is still signed in in different apps, as @carrollgt91 suggested in the #covid_mobile_data_collection channel) which then is send to the PIS on demand. The PIS (our work) should then:
If my description of the goal of this specific repo is correct I thought about using the Public or Private Grid Platform using the PyGrid-library to make the exchange of sensitive data from service endpoints or training data possible.
If I understood our goal correctly in both scenarios no direct channel would need to be established between the user and the data-consumer because either the data-consumer would train their model using PyGrid (the second use-case) or the data is only provided to the SSI team which would then do the issuing, validation, etc. of credentials for authentication with the data-consumer.
I may have too limited knowledge about the detailed working of PyGrid and the specific data needs of the SSI team, but potentially this could help us use much of the already existing code from other OpenMined projects.
@PlamenHristov Great thoughts here - I think some form of user-managed encryption scheme does solve a lot of the issues here, this one and the security breach piece, which is great. Just to make sure I understand your proposal, it seems that
Assuming those assumptions are correct...
One thing I really like about this proposal is how easy the UX is for the user to share data with data consumers when they're on the same device that has the key pair on it. It's not meaningfully different from an SSO handshake where the app you're signing into is requesting certain data from the sign on provider - i.e. sign in with facebook -> provide your name and profile photos.
However, there are some additional challenges we'd need to overcome with this strategy. I'm not as familiar with what we'll need to do to hook into the rest of OpenMined infrastructure (i.e. PySyft), so I'm not going to comment much on that piece, and instead I'll focus on the data consumer use case.
The key exchange would need to be implemented in such a way that would allow for the immediate use of the data within a data consumer. I think we'd want to supply client libraries that would make this process very easy, similar to how there are tons of off-the-shelf client libraries for the OAuth and OpenID protocol. Ease of integration for the data consumer is really important, and if we have to ask them to implement custom decryption, I think that will reduce the number of applications willing to integrate. The more we can lean on existing libraries for this, the better - there's a lot to like about the BIP-based crypto you linked to. There are a good number of libraries for it in different ecosystems. However, I think it's worth examining alternative options for the encryption scheme that would be easiest for the data consumer to integrate with.
We'll definitely need to have more robust client-side applications built to generate/manage these keys, as well as house the user's sensitive data. Here are a few things that we'd need:
In addition to being an attractive target for hackers, storing these SSO credentials presents an interesting trust problem from the perspective of more privacy-conscious users.
We are just committing to the user that we are not storing their information. We are not providing strong guarantees, cryptographic or otherwise, that we will not use this information for our own gain.
Especially for more powerful SSO integrations, such as bank accounts, it might be hard to convince folks to trust us.