datamade / how-to

📚 Doing all sorts of things, the DataMade way
MIT License
87 stars 12 forks source link

R&D: Deploying Django and Postgres on Heroku #9

Closed jeancochrane closed 5 years ago

jeancochrane commented 5 years ago

@jeancochrane commented on Thu May 02 2019

Motivation

In DataMade's DevOps endeavors, we currently face three related problems:

  1. We spend more time than we would like to doing unpleasant server maintenance
  2. Our zero-downtime deployment strategy is mature but brittle, which indicates to us that perhaps we're overengineering our solution
  3. We would like to containerize our production apps, but we don't know where to start, and we've had bad experiences with trying to orchestrate containers on AWS ECS and on bare EC2 instances

These are three problems that the Heroku Platform promises to help solve. The Heroku Platform includes three services that I'd like to investigate: Runtime, Postgres, and Flow.

The Heroku Runtime service is supposed to build, deploy, and orchestrate application containers (Heroku calls them "dynos") from source code or from Dockerfiles with minimal configuration. The Heroku Postgres service is supposed to provide a managed database experience similar to AWS RDS that integrates with application dynos. The Heroku Flow service is supposed to provide a CI/CD solution that runs tests and builds preview apps using dynos for each PR.

The marketing materials for these services paint a picture of one possible future for our server-bound deployments: ephemeral, container-based, rebuilt on every push, with a GitHub collaboration experience similar to Netlify. However, since this is basically ad copy, I don't feel like I have enough information to make an informed decision about whether we should pilot deploying with Heroku.

Proposal

I'd like to stand up a simple app on the Heroku Platform in order to test out the services I listed above. Key questions will include:

I'd like to use Just Spaces as a toy app to stand up the testing stack. Just Spaces is a good candidate for a toy project because it incorporates most of the key elements of our stack -- a Django backend, a frontend served with Django over Nginx, and a Postgres database -- while still being simple enough to stand up quickly.

Deliverables

Timeline

I plan to start and end this prjoect on Friday 5/10.

CC @hancush @fgregg


@hancush commented on Thu May 09 2019

this is great, @jeancochrane – thank you! i think that having firsthand experience with an alternative approach will really enrich our upcoming discussion of deployment dreams (https://github.com/datamade/devops/issues/90). it's a bonus that this approach seems to tick a lot of desirable boxes, namely ephemerality. we're pretty intimately familiar with the challenges of our current approach, and it will be nice to have that grounding context for a leading alternative, too.


@fgregg commented on Wed May 22 2019

R&D issues are going to be move to the new R & D repo.

jeancochrane commented 5 years ago

Heroku R&D notes

Answers to questions

The biggest conceptual leap for me was having to containerize the application. Theoretically Heroku should allow us to deploy non-containerized applications if we use buildpacks (which are basically container images for common application types that Heroku manages) but for most uses I suspect we'll have to use Docker instead. As an example, the Django buildpack doesn't come with GeoDjango support, so I wasn't able to test the buildpack for my R&D.

Depending on how you look at it, this could be either an advantage or a disadvantage of the platform. We've been talking about wanting to invest more in containers, and if we do that, Heroku will make a lot of sense. If, however, we decide we want to continue developing with local dependency management, then Heroku likely wouldn't be a great fit.

Outside of app containerization, the Heroku platform feels like a natural extension of our practices. Heroku Pipelines provide you with review apps that get built on every PR, as well as a staging and production environment, both of which can be triggered by GitHub pushes (and which can be configured to wait until tests pass before building). The workflow feels a lot like Netlify, if Netlify were deploying containers for us. The interface is about a thousand times more friendly than AWS, too.

Heroku gives you shell access to services through the Heroku CLI. Authentication is managed by Heroku for the CLI: When you run heroku login, the CLI opens up a browser and asks you to authenticate with Heroku. Then, you get shell access to any services for which you're a Collaborator on Heroku.

You can use heroku run to run one-off commands against running services -- e.g. here's how I opened up a shell to create a superuser for my app:

heroku run --app just-spaces bash

Heroku Postgres exposes its own CLI subcommand that you can use to interact with the database. Here's how you open up a psql session:

heroku pg:psql

Beyond shell access, you can also monitor service health and behavior through the Heroku console. Each app (like review, staging, or production) has a metrics dashboard showing memory usage, response time, and throughput, as well as the amount of load on the dynos (the Heroku containers).

We do indeed get fully ephemeral applications out of the box, but at a cost. On Heroku you pay per usage, so the more dynos you run, the more you have to pay. In practice, however, Heroku has a free tier for both apps and Postgres that met my needs perfectly fine for review and staging apps.

The CI costs $10/mo, but I found that you don't actually need it in order to tie CI to CD -- you can configure a Heroku app to only deploy once tests have passed, even if those tests are run by an external provider.

The integration is really smooth, and is by and large abstracted away from the developer. You can set a Heroku config file in your repo to specify that your app requires a database, and then when Heroku spins up review apps it'll automatically create and connect a database, updating your app's config variables to provide the database connection URL. (For staging and production apps you have to create the Postgres database manually, but this is as simple as clicking one button in the console and choosing your database size.)

The main downside of Heroku Postgres is that the app and the database don't live in the same VPC. Instead, the database is exposed to the public Internet, which is how your application authenticates with it (and how you authenticate with it over the CLI). There's nothing strictly insecure about this method, but database administrators generally advise you not to expose your database to the public Internet, because it means that if there's a security flaw in the way the database is configured then attackers can get direct access to your database without having to crack into your application first.

Another downside is that Heroku Postgres is pretty expensive. There's a "hobby" tier that gives you 10,000 rows for free, and you can bump it up to 1,000,000 rows for $9/mo, but these databases are colocated with other customer's databases and don't come with uptime guarantees or advanced monitoring. (You can also psql into your database and see other people's databases, which is kind of creepy! I poked around and it seems like they've configured the permissions pretty well, i.e. you can't access any other users' data, but it just goes to show that the cheapo tier is basically just creating a database inside a big shared Postgres cluster.) The cheapest option that isn't colocated is $50/mo. This is actually pretty comparable with AWS RDS pricing, but it's a lot more expensive than our current practice of colocating our Postgres and app installs on the same EC2 instance.

Like Travis CI, Heroku lets you configure secret environment variables for each app through its console (or its CLI). Those environment variables then get threaded into the application at build time and at runtime. Public environment variables can be configured in the Heroku configuration files that you keep in your repo.

This sort of paradigm works a lot better with containerized applications, where you can define the environment through an .env file or environment attribute in a Docker Compose file and then pass those into the container. I could see a simple secrets management solution where we have an encrypted .env file for local development (or just download it from S3), and then for review/staging/production we configure secret environment variables in the Heroku console.

DNS delegation is a little bit different than Netlify because Heroku doesn't have a full DNS service the way that Netlify does. On Heroku, if you want to put your app behind a custom domain, you have to create an ALIAS or CNAME record in your DNS provider that delegates DNS to a Heroku hostname.

This pattern works perfectly fine for subdomains, but for root domains it's a little tricky because our domain registrar of choice (NameCheap) doesn't currently offer ALIAS or ANAME records for root domains. To get custom root domains to work with Heroku, we'd need to use a different domain registrar. (Luckily, this is a pretty common feature these days, and I'm surprised NameCheap doesn't offer it.)

Dynos are comparably priced to EC2 instances: there's a free tier that works nicely for review and staging apps, a super-cheap "hobby" version that's $7/mo but doesn't come with horizontal scaling or long-term metrics, and then a "standard" version that's $25/mo. The hobby and standard version both come with 512GB of RAM, which is basically half of a t3.micro EC2 instance. Beyond "standard" you can pay a lot more for more RAM, but those seem out of our price range.

The biggest difference between EC2 and Heroku pricing is that an EC2 instance can run a bunch of different processes while a dyno can only run one (since it's basically a container). In practice this means that Heroku will be more expensive than EC2 since we'll need to pay for at least two dynos (production application and production Postgres) for any given application.

However, the $7/mo offering seems really appealing to me, and for low-traffic sites I can't figure out a reason why it wouldn't work. In this case, Heroku would probably be cheaper than EC2. I'd love for someone else to check my reasoning on this.

For more on dyno pricing, see the pricing page.

Feature parity

These are the tasks that I completed to ensure feature parity with Just Spaces:

You can see my test app at https://heroku.jeancochrane.com!

Pro/cons with other options

Currently, the main alternative to Heroku is our stack as documented in deploy-a-site: EC2 instances with Postgres and the application installed, deployed using our zero-downtime deployment framework.

In general, I think Heroku would be a good choice in cases where:

Pros

Cons

Next steps

If we want to consider pursuing Heroku, I think a good next step would be trying to deploy a slightly more complicated application, e.g. one that uses Solr for searching and executes background tasks with Celery or a similar service. I also think a good next step would be to write up some documentation for how to deploy apps on Heroku; the Heroku docs are very thorough but there are a lot of them, and I think Heroku would benefit from the deploy-a-site treatment. At this point if we were to decide to continue pursuing Heroku I would feel comfortable piloting it on a client project.

Lunch&Learn outline

Raw notes

These notes are pretty raw, and just represent things I wanted to keep for later. I present them for posterity but I don't necessarily recommend you read them.

jeancochrane commented 5 years ago

@hancush is leading this up now in the context of deploying containerized applications. We're going to evaluate ECS as well. Work being captured in #19.