Open danielballan opened 6 years ago
Somewhat tangentially, we should also factor lifecycle management of the S3 buckets into our decisions about operating costs.
Note we have no AWS credits at the moment 😬, so we need to evaluate dollars on hand before concluding:
we are more crunched for developer-time than AWS credit at this moment
Otherwise I’d very much agree :)
Either way, though, I think our usage of Redis (for caching or queueing) when correctly set is not large (we are currently somewhat over-deployed with an xlarge [for caching]). At the level of what we need, the cost difference may not be enough to matter anyway (the EC2 vs. Elasticache cost difference is something like 1.25x - 1.5x).
Also, there’s an important differentiation to make here:
Also also: I’m not at all hot on managing Postgres ourselves — automatic snapshots and backups and etc etc is worth so much. I don’t feel like Elasticache offers nearly as much that is special about Redis, though (so how I’d weight the costs is different for Postgres vs. Redis).
Note we have no AWS credits at the moment
Goes to show that I only partially keep up with Slack, and apparently with the passage of time!
automatic snapshots and backups and etc etc is worth so much. I don’t feel like Elasticache offers nearly as much that is special about Redis
Fair point. A documented and separately automated process (Ansible?) for deploying Redis on EC2 would be satisfactory, I think, for both the queue and the cache.
A documented and separately automated process (Ansible?) for deploying Redis on EC2 would be satisfactory, I think, for both the queue and the cache.
Well, I do think it’s still worth talking through whether Elasticache makes sense. I think I just laid out a bunch of pro and con points in a super disorganized way. I’m not actually sure whether I think we should be using it or not (while I am sure we should be using RDS).
But either way! We should probably make an issue to A) document all our non-kubernetes resources & deployments and B) automate them (whether by Ansible or Fabric or Terraform or straight-up Python scripts). The -kube repo should probably become the -deployment or -ops or -sre repo.
Summary of conversation on call: Given current cost constraints, stick with manually-managed Redis but document the process. Maybe someday move that process into Ansible. Stick with RDS.
Update: credits issues resolved. Here’s my proposal for now:
Unlike the cache, we do want to have some resiliency guarantees for the queues, and offloading that guarantee to AWS is nice (they aren't critical, but it would be a big bonus).
I can see eventually moving the Redis queues to a hand-managed instance, but by the time we get there we will have hopefully rewritten the DB and our queue management story will probably be pretty different anyhow.
Side note: I need to re-deploy the cache machine in us-west-2. It’s currently in us-east-1 because that's where the API service was when it was Heroku-based. (#119)
Things we are trying to keep in mind:
We have two kinds of state:
For the database, we currently use RDS. This satisfies (1) because it is outside the cluster, and it satisfies (2) because although RDS is proprietary, postgres itself is an open standard available anywhere. For the cache, we use a Redis image deployed inside the cluster. This violates (1). We could instead use Elasticache, which meets (1) and (2) -- analogous to our use of RDS. However, this strategy is not optimal with respect to (3): both RDS and Elasticache are more expensive that hand-deployed postgres and Redis.
My personal opinion is that we are more crunched for developer-time than AWS credit at this moment, so we should use RDS and Elasticache for now and keep an eye on the operating costs. We know it is technically straightforward to hand-deploy postgres and Redis on EC2 instances outside the cluster. Once we have a good handle on the operating costs, we can judge whether it is worth the potential cost savings to do so.
Thoughts?