OpenTechStrategies / torque-devenv

Set up a development environment for torque-sites
0 stars 0 forks source link

Handlng GPG decryption #3

Closed slifty closed 3 years ago

slifty commented 3 years ago

I'm currently working on setting up ETL pipeline (which is arguably more than what #1 calls for, and warrants its own issue, but who is counting).

The problem I face is that ETL requires access to sensitive data. The data is encrypted via GPG, which is good. The challenge is how do we decrypt, especially given the fact that vagrant is non-interactive.

Here are the considerations:

  1. Decryption happens AFTER a lot of setup but BEFORE final setup. Our current installation instructions expect a few things about file directory structure, which means that the actual downloading of the data is best performed inside the guest machine. This is a long way of saying that data download needs to be part of a provision script.

  2. Decryption requires secrets that only live on the host machine, and those secrets are very sensitive / should not be stored in plain text (e.g. GPG keys).

Ultimately we have to figure out if decryption on the host machine can somehow be invoked (or if those secrets can somehow be securely passed to vagrant so the decryption can run on the guest machine)

Possible Solutions

Multiple provision scripts / steps

Vagrant allows you to define multiple provision scripts. We could separate provisioning into two steps. The first would set up the code and download the data, the second would invoke ETL. In between the first and second step we would expect the user to run the decryption scripts.

The setup story might look like this:

$ vagrant up
$ ./decrypt
$ vagrant provision --provision-with etl

This is more steps than a simple vagrant up which just means more room for error, however there may be value in having an explicit etl provision step anyway since etl / deploy is run more than once.

The much more significant concern with this approach, however, is that it makes more assumptions about the host machine. In particular, ./decrypt will either work or it won't work on the host machine. If the decryption step happens inside of the guest machine then we're in complete control of the environment.

This is arguably OK since users on unsupported platforms can still decrypt manually, they just don't benefit from the automation.

Importing GPG credentials

Vagrant has several ways of passing secrets to the provisioning scripts -- indeed, vagrant assumes secrets are being passed to it since there is no interactive mode -- and so the GPG key + various passphrases could be passed in one of those ways.

I'm not going to go too deep into this solution because we feel this signals bad practices in a way we're unwilling to pursue. This approach would mean (A) temporarily storing passphrases and private keys in insecure locations and (B) advising other people to do this... NORMALIZING this kind of behavior. WE ARE NOT PREPARED FOR THE KIND OF JUDGMENT AND SUFFERING THAT WOULD PRECIPITATE.

Exposing the host's gpg-agent

It looks like we might be able to configure vagrant to point to the host's gpg agent. There is more detail in this post

Importing short-lived subkeys

I'm not sure if this is better than the "Importing GPG credentials" option, but it may be more palatable if the subkeys could have an explicitly short expiration time. more info on subkeys

slifty commented 3 years ago

Update on this: Apparently my understanding of decryption was wrong!

Thank you @frankduncan for talking me through this.

Decryption does NOT require a gpg key; it just requires a passphrase. The passphrase is secret, but in a similar way to normal vagrant secrets -- meaning they can be passed via ENV which is just a thing people do in this world and it's not so bad.

getting that passphrase requires use of opass and gpg, but that is more due to OTS internal practices and not so much related to this project directly. It's in the scope of opass itself to help document / automate the ability to set up opass. I don't think that we need to have opass be part of vagrant (if the dev chooses to set up opass that's up to them)

slifty commented 3 years ago

This is now resolved!

The GPG passphrase to decrypt bigdata secrets must be passed in via .env -- opass is not involved, and the user's GPG key is never added to the guest machine.