dirac-institute / trailblazer

An open data repository for astronomical data products affected by satellites
MIT License
8 stars 2 forks source link

Add a more flexible and secure authentication and security critical configuration handling. #2

Closed DinoBektesevic closed 3 years ago

DinoBektesevic commented 3 years ago

Added a Config class that reads things from a YAML config file. This is more flexible and more practical going forward.

Should be improved perhaps to better mask security critical information. The problem is that it's long lived and globally accessible. Will do for now.

spenczar commented 3 years ago

Just a drive-by comment: idiomatic Django would be to put all configuration in trail/settings.py. You can do your own thing, but you may have trouble using off-the-shelf Django extensions, which will certainly all expect you to put things in settings.py.

What is the advantage of putting config into a yaml file?

DinoBektesevic commented 3 years ago

I can't exactly put PostgreSQL connection details into settings.py without manually sanitizing the file before every commit. Same goes for Django's SECRET_KEY This invites problems, no?

The config class is there to punt out the security critical information out of the regularly committed code and into an external, not committed, file.

As for why YAML: I could continue with having my postgresql data in one config (I don't know, maybe even the .pgpass files would work, haven't tried)l, then the Django secret key in another, then probably we will want to record some name of a bucket or a tidbit or two of S3 data somewhere and YAML makes that quick and easy to do from a single file while, down the line, leaving the option of splitting this up into multiple files if I want.

spenczar commented 3 years ago

Okay, got it - that makes sense. A few suggestions.

I think it makes sense to rename this to something like secrets.yaml, and only put secret values into it. That will help clarify intent, and help ensure that settings appear in fewer places. The current code includes setting stuff like "static_root: ~/trail/static" which seem superfluous; you may as well put that in settings.py.

A separate question is whether yaml files are the right way to store secrets. I think this is way looser, but I can just post some options for you to consider.

  1. Environment variables - this is probably the most common way people do this in Django. It's how a system like Heroku would expect to work, it's how Github Actions CI would like to pass secrets in, it's easy to do in containers, etc. Your server's run script needs to set the environment variables, then.
  2. Secrets managers - this is a little fancier but can be nice: you can store secrets in something like AWS secrets manager, and then retrieve them at runtime. For example, scimma-admin defines a get_secret function which is used to get the DB password
  3. Some people make yet another .py file with production-specific settings, and then do from prod_settings import *; the import statement overrides names with production versions. Then, prod_settings is a file that you don't commit - it only exists in your deployed server. I kinda hate this, I think that import has too much confusing behavior. But it's common.

Reading secrets from a file, though, is just fine too if that's what you prefer.

DinoBektesevic commented 3 years ago

Hello @spenczar - sorry for the long delay, between various obligations I quite literally could not make any significant progress in last two weeks before this weekend.

I took to heart your recommendations and renamed the class, I added, and tested in deployment, functionality for secrets.
I have not added env var overrides today because I wanted to focus more on the processing part, but it should be trivial to add specific class overrides now too. Hope this ticks off at least some your checkboxes.

mrawls commented 3 years ago

Let's close this out so we're not blocking #3 anymore