Open gadenbuie opened 5 years ago
Also the ROpenSci package sodium which as a pretty decent overview of how encryption can be handled in R:
# Bob's keypair:
bob_key <- keygen()
bob_pubkey <- pubkey(bob_key)
# Alice's keypair:
alice_key <- keygen()
alice_pubkey <- pubkey(alice_key)
# Bob sends encrypted message for Alice:
msg <- charToRaw("TTIP is evil")
ciphertext <- auth_encrypt(msg, bob_key, alice_pubkey)
# Alice verifies and decrypts with her key
out <- auth_decrypt(ciphertext, alice_key, bob_pubkey)
stopifnot(identical(out, msg))
# Alice sends encrypted message for Bob
msg <- charToRaw("Let's protest")
ciphertext <- auth_encrypt(msg, alice_key, bob_pubkey)
# Bob verifies and decrypts with his key
out <- auth_decrypt(ciphertext, bob_key, alice_pubkey)
stopifnot(identical(out, msg))
The main idea behind the private key, pubkey pair is that users share their public keys with others. Data is encrypted for a particular person by using their public key (and your private key). They can then encrypt using the reverse keys – i.e. their private key and your public key.
The main objective is that you need a public and private key pair to decrypt the data, and in all cases the private key should not be transmitted, moved, or sent.
So when @tgerke and I talked about this originally, we thought we could later provide keys to the end user to let them decrypt data they have. This probably wouldn't be a good idea from a security perspective.
What we could do instead would be to initially deliver data encrypted using the owner's private/public keys, knowing that it will not be decryptable to anyone else. If at a later point the user is granted access, we could
In both cases end users can use/manipulate/etc the unencrypted data as they see fit. In the first case, the regenerated data might be updated, contain more records, etc. but would hopefully be the same shape. The second case could be used for any derivative data or for situations where the source data may have changed but the user only has access to the version they received.
Good find re: not providing keys later. Does the Providing a public key
section https://github.com/SurgicalInformatics/encryptr help? TBH I don't think I fully understand how that's different than the initial solution, but it must be since it's got a section of its own.
I'm not sure I fully understand either, so I think that's where the blog post can go: walking through a scenario with multiple collaborators sharing data.
My current understanding is that putting the Owner's (or data pool's) shared key would handle the first arrow above in terms of the "User" getting the Owner's pub key. But I still think the data needs to be encrypted for someone specific, otherwise anyone with the data pool public key could just decrypt the data.
encryptr is interesting and allows you to do something like
To encrypt the columns
postcode
andtelephone
, enabling the data to be shared without the risk of exposing PHI.encryptr
uses RSA, so it has a similar authentication model to ssh, except it seems that the private key is required for decryption.The package README really doesn't spend much time explaining how to use and share keys with others.
From How does RSA work?
A blog post could explore an example with more details about key generation, key sharing, etc.
Also there is a related ROpensci package cyphr which seems to be more oriented towards encrypting files. This might be a better package choice (better community support, etc.) but there is a gap in the README in terms of column-specific encryption.
Finally, another interesting package for secret sharing is secret by Gabor Csardi et al. This package is oriented towards sharing API keys but the UseR! 2017 presentation about secret could provide a good starting point for sketching out the ideal key-sharing workflow.