datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
80 stars 12 forks source link

Create way to programmatically provision AWS S3 buckets / policies / users #311

Closed smcalilly closed 1 year ago

smcalilly commented 1 year ago

Background

Related to #275 — the culmination of that work would be a programmatic way to provision AWS resources for Django apps. This would follow our preferred pattern of infrastructure as code. Provisioning that stuff is hard to do when following a written document, and creating a script or automated process of some sort would make this easier and less error prone.

I have a very poor attempt at doing this in https://github.com/datamade/s3-access-wizard

Proposal

We need to come up with a better way that either uses the AWS CLI to do this (which is what my bad attempt is doing), or is managed by a tool like Terraform or the serverless framework.

Deliverables

An easy-to-use devops tool that can help us provision our AWS S3 buckets / policies / user.

Timeline

1-2 investment days.

smcalilly commented 1 year ago

I'm curious to try serverless — I'm familiar with it and it's all in yaml so our team knows that syntax. We could even bake it into our cookiecutter, though then our naming scheme would be public. I don't think this is a huge security risk since that information is already publicly available in our html, but I'm curious to hear other opinions about that. Here's a snippet from the IL NWSS site with the bucket name and the public AWSAccessKeyId: <img class='w-100' src="https://il-nwss-dashboard-staging.s3.amazonaws.com/images/wastewater-lab3.original.png?AWSAccessKeyId=AKIAQUHZIOEIGGAGKU4S&amp;Signature=BIMoGUtxMZVZm5ZfItfTceAQCc8%3D&amp;Expires=1674241103">

If we baked it in our cookiecutter, then our policies would be public. Considering a threat model, I would assume that a smart attacker could reasonably guess the policy's access level whether it's public or not. Regardless, this would be the strongest case against including it in our cookiecutter.

smcalilly commented 1 year ago

One way we could do this:

This setup could be nice because it would:

@hancush curious to hear your thoughts about this idea?

hancush commented 1 year ago

@smcalilly I love this! I also love that it's centralized, as opposed to our old server config repos, which had to be created individually for every project. Are there any downsides to this approach?

smcalilly commented 1 year ago

Are there any downsides to this approach?

@hancush I've been thinking about this question. This is the most I can think of right now:

  1. We'd have to invest time and effort into learning a new tool.
  2. The GitHub Action would have access to our AWS account's API, at least as much access as we grant the token that it would use. We could figure out a smart IAM setup for the token so we can limit the attack surface. And I'm pretty sure that CI/CD for provisioning AWS resources is a common pattern, so we'd just need to spend time learning the best practices here (e.g.: https://skundunotes.com/2023/03/07/ci-cd-with-terraform-and-github-actions-to-deploy-to-aws/)
  3. Existing resources wouldn't be controlled this way, unless we spent the time to add everything retroactively. Having a split between an old way and a new way bothers the part of me that likes to be organized.
fgregg commented 1 year ago

it might also be worth looking at: https://github.com/simonw/s3-credentials

smcalilly commented 1 year ago

the s3-credentials tool is the way! it fits our use case exactly. it uses boto3 and it's somewhat flexible — much better than my confusing makefile + aws CLI hack!

i'm updating the google doc guide to use that.

smcalilly commented 1 year ago

Updated the google doc guide here: https://docs.google.com/document/d/1IzP3zfbr6-zYNtzWuMs4t3AVdSfi-LtXfXR0hszez94/edit?usp=sharing

Very simple!