NDAR / nda-tools

Python package for interacting with NDA web services. Used to validate, submit, and download data to and from NDA.
MIT License
48 stars 22 forks source link

must use a real method to handle secrets #28

Closed GodloveD closed 8 months ago

GodloveD commented 3 years ago

I have declined to install this tool at the NIH HPC on behalf of a user because it does not handle secrets appropriately. In particular it saves usernames and passwords in a plain text file in the user's home directory. If these data are unencrypted at rest I have no reason to believe they are encrypted properly in transit either. Please change to a well-established key, token, or password model for secret management.

obenshaindw commented 3 years ago

Similar to issue as #20

obenshaindw commented 3 years ago

@GodloveD and @yarikoptic you both have identified that there is room for improvement here. We have a few options for how to resolve this: encrypt the secrets, make the settings.cfg file only accessible to the user, or do away with persisting secrets and instead always ask for them and/or look for them from local environment variables.

  1. Encryption: If the encryption key is on the same file system as the encrypted settings.cfg file, which the tool would need to access the values in the file, this may not represent much of an improvement.
  2. Using Permissions: commandline tools like, the AWS command-line tool make the ~/.aws/credentials file accessible only to the user and checks that the permissions are appropriately set. This is probably the least amount of work to implement.
  3. Using temporary environment variables: we could check system environment variables instead of a text file, which would only be accessible by the user and session.
  4. Ask at runtime: if values are not present in environment variables, we could prompt the user to enter them; we do this already if they are not present in the config.

@GodloveD and @yarikoptic would option 2 be acceptable for more securely handling secrets?

yarikoptic commented 3 years ago

Since many existing tools (as you mentioned aws but also git and others) follow that approach, I guess 2. is generally considered Ok.

FWIW it could as well be combined with 1., i.e. they would not be stored plain text but encrypted (or anyhow obfuscated) with a secret known only to nda-tools (I believe git doesn't store them plain as well). It is more of an extra measure to cause a bit more extra effort/awareness from the malicious virus/user crawling victim's $HOME in search for secrets.

3. could be used in addition to facilitate "one time use" so no secrets stored if user desires so, or for testing (e.g. by nda-tools developers who also might be nda users).
4. is a good UX for the initial case where credential is not stored or provided via env var (3.)

In DataLad and DANDI we pretty much "do all of the above", we use https://pypi.org/project/keyring/ to store credentials in a key store configured/native/appropriate for the system (if available or create a dedicated one if not), we warrant env vars to avoid querying keystore (might need a password to unlock), and ask for credential if not known (and store in keyring for reuse). The difficulty is that system keystore might not be available, e.g. when you ssh into a Linux server with no X desktop running. Then you might like to establish a dedicated keyring. See e.g. https://github.com/dandi/dandi-cli/blob/HEAD/dandi/dandiapi.py#L244 and used there keyring_lookup https://github.com/dandi/dandi-cli/blob/HEAD/dandi/girder.py#L917 for an implementation we have in dandi-cli.