Rungutan / sentry-fargate-cf-stack

AWS CloudFormation template to launch a highly-available Sentry 20 stack through ECS Fargate at the minimum cost possible
Apache License 2.0
60 stars 16 forks source link

RFE: make ElastiCache and RDS highly available #16

Closed nodomain closed 3 years ago

nodomain commented 3 years ago

While all of the components are set up in a fail safe, redundant way, both ElastiCache and RDS use only one instance.

Since these are important components in the stack’s architecture I propose to provide the option in the template to set them up redundantly, e.g a Redis Cluster and a PostgreSQL instance with fail over. The default setup could still be “cheaper” and without redundancy.

mariusmitrofan commented 3 years ago

AFAIK ElastiCache cluster is not supported by Sentry.

As for PostgreSQL, the only way it would be supported by Sentry is if it were Aurora PostgreSQL (serverless or not) instead. Reason for that is that we can afterwards add automatic scaling of instances while still being accessed through one endpoint by Sentry. I guess we can offer the option to chose one or the other via parameters.

What do you think ?

nodomain commented 3 years ago

In our other projects we always use a "Redis replication group with Cluster mode disabled" which behaves like a single-node Redis instance to the application and the application does not need to support Redis clusters. Nevertheless, you can put a node in every AZ and have high availability by that. See https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/Replication.Redis-RedisCluster.html.

I like your suggestion regarding PostgreSQL.

mariusmitrofan commented 3 years ago

Looks like there is a CF resource for this which also supports encryption at rest (for Redis).

So 2 birds with 1 stone I guess - https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticache-replicationgroup.html

nodomain commented 3 years ago

Are you going to take care about the Aurora PostgreSQL then? I have no idea what I need to do to migrate the existing one-instance setup to a HA setup from a database content perspective.

mariusmitrofan commented 3 years ago

That's the thing... We can't migrate it through an automated way as part of the CloudFormation process.

All we could do to support such change for existing users (such as yourself) is to write a bash shell script to be executed as following:

I'm fine with writing it myself, just need a feedback from your side if it's an acceptable way of doing it.

If you have any other ideas though, I'm all ears.

mariusmitrofan commented 3 years ago

In the meantime, I'll prepare the template changes @nodomain

nodomain commented 3 years ago

All we could do to support such change for existing users (such as yourself) is to write a bash shell script to be executed as following:

  • export (before stack deployment) through pg_dump
  • import (after stack deployment) through psql

I'm fine with writing it myself, just need a feedback from your side if it's an acceptable way of doing it.

If you have any other ideas though, I'm all ears.

That sounds perfect. Thanks for the script - I am a pgsql noob :)

nodomain commented 3 years ago

What about making ClickHouse also HA? From my understanding this is also an important component of the system from a data processing/storage perspective, right?

mariusmitrofan commented 3 years ago

Unfortunately not that simple :)

It requires 2 things to be modified -> ClickHouse and Snuba

ClickHouse

For us to do it via code isn't that hard, as ClickHouse does support clustered mode and all we would have to do is just tweak the OpsWorks Chef cookbook -> https://clickhouse.tech/docs/en/sql-reference/table-functions/cluster/

Snuba

As mentioned by lynnagara on the Sentry official forum, Snuba does not support clustered installations of ClickHouse.

In theory, in the comment below him seungjinlee managed to do it, but from my point of view it's just a hack that might not be efficient or stable to perform in the long run.

That's why I was thinking to just "leave it as is" and wait for Snuba to fully support clustered installations, as migration from single-instance to cluster-mode in ClickHouse is indeed possible, but the other way around (in case of rollback) is not...

nodomain commented 3 years ago

That sounds reasonable. Thx.

mariusmitrofan commented 3 years ago

@nodomain / @eduardopuente :

Aurora RDS change has been deployed under release 1.10.0.

Please check guide for upgrading in README notes and/or release notes ->

https://github.com/Rungutan/sentry-fargate-cf-stack#migrationg-from-any-version-before-1100-to-1100-or-newer

In version 1.10.0 the database engine has changed from AWS RDS (Native) PostgreSQL to AWS RDS Aurora PostgreSQL.

In order to correctly perform the migration, you would have to:

* Make a database backup (see utilities/dump.sh script)
* Run the CloudFormation stack update
* Import the database dump (see utilities/import.sh script)
mariusmitrofan commented 3 years ago

image