WAL Restore from buckets - discussion

LamaAni commented 4 years ago

What:

Use a bucket (either gcs or aws) to write a WAL backup.

(MORE DEFINITION TO COME)

Why

Though a bit out of scope, since this chart is supposed to be used with cloud computing, the usage of a remote bucket to store the WAL will be very useful.

Acceptance Criteria

(MORE TO COME)

sstubbs commented 4 years ago

This would be an amazing feature.

In my docker swarm tests I setup slaves. I had an environment variable which was set for gtm, coordinators and datanodes and when it was removed it would convert them to masters. Worked well in limited testing I did but it comes with then managing increasing the size of twice the number of persistent volumes and containers. It also won't allow point in time recovery.

I also had a container which would take daily pg_dumps and keep them for a limited amount of time. However this is completely impractical for larger databases so definitely not a good option.

When I was using standard postgres before moving to postgres xl archiving wal to s3 worked well so I really like the idea of this. I will also look into this after the benchmarking.

The one thing I would suggest though is taking a base_backup of nodes and restarting the archiving at at defineble intervals i.e. every month or two so in the event of a failure there isn't a massive amount of downtime as wal is replayed or maybe slaves should be added too at some point for failover as a separate issue.

In my production non containerised postgres xl deployments I use slaves for ha and pg_dump at intervals which can be very time consuming for large databases. I was always a bit scared of nodes not matching if recovering from a base_backup using wal replaying.

Anyway this is a great start conceptually I will definitely help where I can with this as I have worked a fair amount with this.

LamaAni commented 4 years ago

@sstubbs I am starting to think this feature should not be in this repo. If it is possible to separate this enhancement to a different chart it might be usable with other postgres helm implementations.

Are u familiar with the kubedb package? It dose allow this implementation (and also implements pg-bouncer). I am not sure that we can implement with this package, it dose not seem to be a standalone chart, but we can definitely take some insights there.

This is why I was thinking to delay this for a bit while I'm looking around. I would love if you could take a look.

sstubbs commented 4 years ago

Just looked at it now briefly. You right this is probably best added to a separate chart as it is such a broad topic. I will look around and see what I find too.

sstubbs commented 4 years ago

kubedb does look very impressive.

LamaAni commented 4 years ago

I checked they don't have a pgxl/distributed option. They also add some.. configuration to postgres. I was thinking of maybe contributing there - only thing is I don't wanna lose control of the project at this point. I think I am gonna open an issue with them at some point, and see what they think about the project itself. Get some feedback.

sstubbs commented 4 years ago

Good idea. postgres xl is a pain on the backup side for sure as you want to ensure cluster wide integrity which isn't a problem for more traditionals dbs. I will definitely keep looking too.

LamaAni commented 4 years ago

I actually did that right now, lets see if we can use their charts to do it. See here: https://github.com/kubedb/project/issues/679

sstubbs commented 4 years ago

ok great

LamaAni commented 4 years ago

After taking a closer look I am starting to think kubedb is adding a lot of functionality - that is a bit questionable. The chart (or rather kubernetes deploy type) is heavy and requires a lot of configuration. @sstubbs What do you think?

LamaAni commented 4 years ago

Also, have you seen: https://github.com/appscode/charts/tree/master/stable It has a vault chart

sstubbs commented 4 years ago

In terms of appscode I'm actually using https://jenkins-x.io/ as we need a solution that also works on premise and I like the workflow. I've been testing for a few months and its working well so far. It also has vault integrated.

For kubedb I think you right it is adding a lot that isn't needed as it is designed as a more generic solution for multiple databases.

Jenkins x has https://velero.io/ integrated which I will be testing. However in the past when looking at filesystem backup solutions in other postgres scenarios it has always been recommended to rather use native postgres backup solutions.

When doing the docker swarm version I was testing. Adding slaves and ha was simple. I just added a PG_MASTER_HOST env variable if it was a slave

for coordinators and datanodes I had the following:

if [ ${PG_MASTER_HOST+"false"} ]
then
    echo "starting basebackup"
    pg_basebackup -d "postgresql://${REPLICATION_USERNAME}:${REPLICATION_USER_PASSWORD}@${PG_MASTER_HOST}:${PG_PORT}" -D "${PGDATA}" -P --wal-method=stream

    echo "postgresql: adding slave config"

    echo "standby_mode          = 'on'" > "${PGDATA}/recovery.conf"
    echo "primary_conninfo      = 'host=${PG_MASTER_HOST} port=${PG_PORT} user=${REPLICATION_USERNAME} password=${REPLICATION_USER_PASSWORD}'" >> "${PGDATA}/recovery.conf"
    echo "trigger_file = '/tmp/MasterNow'" >> "${PGDATA}/recovery.conf"
    echo "hot_standby = on" >> "${PGDATA}/postgresql.conf"
else
    rm -rf "$PGDATA/recovery.conf"
    sed -i '/hot_standby = on/d' "${PGDATA}/postgresql.conf"
fi

and this for the gtm

if [ ${PG_MASTER_HOST+"false"} ]
then
        test -d "$PGDATA" || docker-init-gtm
        sed -i "s|#startup = ACT|startup = STANDBY|" "${PGDATA}/gtm.conf"
        sed -i "s|#active_host = ''|active_host = '${PG_MASTER_HOST}'|" "${PGDATA}/gtm.conf"
        sed -i "s|#active_port =|active_port = 6666|" "${PGDATA}/gtm.conf"
else
        test -d "$PGDATA" || docker-init-gtm
        sed -i "s|startup = STANDBY|#startup = ACT|" "${PGDATA}/gtm.conf"
        sed -i "s|active_host = '*|#active_host = ''|" "${PGDATA}/gtm.conf"
        sed -i "s|active_port = 6666|#active_port =|" "${PGDATA}/gtm.conf"
fi

This is not exactly what I had as it was using slightly modified scripts in https://github.com/pavouk-0/postgres-xl-docker but just to give you an idea.

What do you think of perhaps integrating something like this into the chart? I'm happy to do it once I'm finished the testing and benchmarks. At least ha is covered then and we can look at backup after once we've tested some more options.

LamaAni commented 4 years ago

I am still not sure about the way to integrate. I am still thinking about what is the least pervasive method to add backup and fail-over.

I am not happy with the kubedb way of doing this, since the replication is integrated into the chart deployment - which adds complexity. I would prefer an external solution that treats the service endpoint (the kubernetes service configuration) as an "any deployment" postgres DB. In this case there will be no need to augment the chart. This would reduce complexity.

Sadly for WAL on first glance it seems that this cannot be done. That said, I need to check all the WAL backup parameters and WAL replication parameters to allow for minimal intrusive connection. Once that is done, other "customized" integrations can be made... i.e. Save to gcs/aws buckets (sidecar containers, or init containers or parallel chart deployment - using the helmfile)

sstubbs commented 4 years ago

ok yes I'm not sure how wal will work for the gtm in terms of a backup. I have velero installed and working. So I can let you know how that goes in a couple weeks.

I have only been able to use slaves and pg_dump working in the past. I did try logical replication a while back from the coordinator as I thought this would finally be the answer to keep a consistent copy of the whole cluster to a standard postgres instance which could be restored from if needed.

Unfortunately postgres xl doesn't support logical replication.

sstubbs commented 4 years ago

In terms of a separate service connecting to the service endpoint the only ways in my mind that would work would be pg_dump and logical replication.

LamaAni commented 4 years ago

Yea, I think you are right about seperate. Still I am looking to take a very minimal approach to this. Lets see what I can come up with.

Also, there needs to be a failover example.

sstubbs commented 4 years ago

Ok great I will keep looking too and let you know if I find anything.

LamaAni commented 4 years ago

Good article: https://hackernoon.com/postgresql-cluster-into-kubernetes-cluster-f353cde212de

LamaAni commented 4 years ago

I must say, I find the postgres-xl documentation a bit obtuse. When I define a replication, do I need to define it for each node? Is it only on the clusters? how is the definition applied?

I am also considering pg-pool to do this part. In which case, as far as I understand it, the configuration would be entirely external to the chart which is excellent. If pg-pool is also its own helm chart it would be possible to externalize a multi pod pg-pool for high availability, replication and failover. Maybe this issue should be external.

I think my next approach would be to checkout pg-pool, to see if I can create a manual deployment (in experimental) using this methodology. If that works I would create a helm chart for pgpool, and create an appropriate example for that. Notably, this could be used also for other pg databases (not just pgxl).

sstubbs commented 4 years ago

Yes their documentation definitely needs work. That article is interesting.

all nodes need to replicated.

datanodes and coordinators are setup and promoted in the same way as standard postgres. the gtm on the other hand has different config. examples for setting up slaves and promotion are in the scripts I posted above.

for connection pooling pgpool definitely is more comprehensive if you need some of the features it has however pgbouncer is way more performant and deals with connections better. I have had issues in the past with pgpool hanging with too many connections and using a large amount of resources. Have run them both in production for years so I can't say pgpool is bad by any means but pgbouncer is definitely better for connection pooling.

Features of pgpool in our context.

It can load balance read queries to slaves. This feature is nice but redundant with postgres xl as we can just scale up coordinators and datanodes. Routing to slaves adds a few more issues that I don't think are necessary when we already have scaling options that are better.
automatic failover. Useful but not completely automated and trivial to implement via scripts and/or repmgr.
master-master replication is just down right broken in my experience. I have had countless issues with this. Maybe it's just bad luck on my part though but even so this is redundant with postgres xl.

I'm sure a pgpool chart would help people and would definitely be usable. I wouldn't use it personally though.

What do you think of perhaps two different charts for repmrg and pgbouncer? This could handle the same sort of use cases but in a better way in my opinion.

LamaAni commented 4 years ago

Since you have experience, if you could do an example, that would help me a lot.

I am not sold on pgpool2 since, well, I dont know much about it. If you would not use pgpool that would mean I don't have a second production tester, which would be a very big problem.

So in this case I would prefer the charts for repmrg and pgbouncer. For this solution the requirements (as I see them - feel free to add or comment) are:

Allow for replication.
Allow for backup.
Allow for failover to slave (connection to slave if master fails).
Allow multiple connections.
Allow for multiple nodes running the pg_bouncer and repmrg in a stateful-set or deployment. (High availability on the connection, backup and restore).

If we can have these (which is what pg_pool is claiming to do), I think we can proceed. In that case we can do two more charts and add these to the main chart repository in helm. (Links to the other charts and examples in each).

More and more I am seeing that this feature should be allowed by this chart rather then implemented by it. In this case, an example would allow me to add the necessary plugins to the chart to allow the connections. I think that might be the best approach here. What do you think?

sstubbs commented 4 years ago

ok great yes that would be great. I'll put something together as a starting point just based on docker images and bash and then we can see. This would be very useful for me too so I'll start it tomorrow and should have something for you a day or two after.

sstubbs commented 4 years ago

Do you by any chance have a sidecar example? Resources on this seem a bit scarce. My idea is to install a sidecar to each pod that will create a client which modifies the config files of each node and watches them. repmgr doesn't deal with the gtm and uses ssh as do most failover options. ssh is firstly insecure and secondly not recommended by what I've seen looking online. I will make this client generic so it works with postgres but have a gtm enable argument for postgres xl.

sstubbs commented 4 years ago

I was thinking of doing it in plain bash like I did for the swarm tests I did but that will need to be integrated into the chart which is not what we want.

sstubbs commented 4 years ago

Ideally I would also like to be able to inject a sidecar containers into pods that need postgres credentials from vault so I don't need to repeat that in deployments.

sstubbs commented 4 years ago

we could still use repmgr for the datanodes and coordinators and just a custom client for the gtm but will just try this first as it seems a bit redundant installing it if it can't handle the gtm.

LamaAni commented 4 years ago

Sorry for the late response. Things are a bit crazy over here, so it might take a day or two. If you could give me the base stuff I'll try and work from that. Apologies.

LamaAni commented 4 years ago

@sstubbs Hi, I have not gotten around to this yet. What is the urgency on your end?

LamaAni commented 4 years ago

@sstubbs For the meanwhile, I added an option to backup the WAL to a PVC. This would allow restore in the case of catastrophic failure, until we get a more automated process in place. Notice the version argument.

https://github.com/LamaAni/postgres-xl-helm/pull/63

sstubbs commented 4 years ago

Ok that makes sense. Still working on it. I've decided to just do a standard postgres example first to try and think through the flow properly as I'm having a bit of a hard time getting it working for postgres xl first.

Basically I have a manager webapp and webapp that runs on each postgres/postgresxl node.

When a node starts up an init container posts to the manager and waits for a response.

The operator can then query the manager and see clusters available which will show all clusters that have posited to the manager and are unitialised then initialise them as masters or slaves.

Manager runs healthchecks to each cluster.

Pgbouncer curls to an endpoint and adjusts based on which master is healthy.

If a cluster is unhealthy failover occurs by the manager posting to the agents and changing the config.

Hopefully I'll have it done tomorrow.

I've basically copied the idea of repmgr but made it a bit more automated and supports postgres xl. Obviously it's not nearly as sophisticated as repmgr but does what I need it to do.

I'm using vertx.io which does support clustering for ha.

sstubbs commented 4 years ago

I'm trying to get the agent working in a separate container so there is no need to modify the postgres/postgeesxl container.

LamaAni commented 4 years ago

That sounds amazing actually!

I am working in a different approach, just in case. I will try and use a more conventional kube approach. Let's if I can get that to pass.

sstubbs commented 4 years ago

Hi,

Sorry it's taken me a long time to get back about this. I've tried various implementations either in pods as well as bash scripts using kubectl externally. Nothing allows as much automation as I would like in a clean way. I've decided to do an operator based on your chart at https://github.com/sstubbs/postgres-xl-operator. It's still a work in progress but it should be done in the next few days. I'm trying to keep it as close to the helm chart as possible. The only changes I've had to make to make it compatible with rust is changing all the variable names to snake case where you had some in camel case i.e. [STS].addVolumeClaims is now [STS].add_volume_claims

The PostgresXlCluster resource replicates helm so you can create and delete clusters. Will have this done in the next day. I'm busy copying your scripts and templates into it and making sure they work in the same way as helm.
I will add a new replication value which determines whether they are masters or slaves and which masters to replicate from.
I will setup a PostgresXlFailover resource which will change slaves to masters.
I will impement the health check logic in the rust app that is used in https://github.com/pavouk-0/postgres-xl-docker
Then I can finally use 3 and 4 to allow ha.

Then I have a few other features I would like to add to the operator like pgbouncer and pglogical replication of the whole cluster from a standard postgres instance.

The helm chart you have done is great for single installs and it would be great if those values could be changed to snake case as I would like this to be as much of a drop in replacement for that if people require managing multiple installs.

Please let me know if you would like to switch to this once its done and/or contribute. If there is anything else you would like it to do let me know. If you would continue rather using helm I will do everything I can to make sure these two projects don't diverge too much.

sstubbs commented 4 years ago

It will still keep your healtchecks to determine pods are running but will also have other healthchecks to ensure the entire cluster is healthy.

LamaAni commented 4 years ago

Hi,

I have looked at your project and I very much like it.

Would you like to join the two projects together? If so I could add you as an admin on this project or create a new one, which is on its own. That said, I have been getting some traction on the project and don't want to lose it (one more company has picked it up for testing), at some point I would also like to publish it to the main helm repository.

If that is done, we could combine all the common sources and give two options for implementation, that will be combined in the future, and use an operator for deployment. I think that would be useful.

For the snake vs camel case, we can use either., Please just submit a PR for the changes. I will change the policy and make camel case the base naming convention.

sstubbs commented 4 years ago

That makes a lot of sense. Let me finish this today. I'll have it creating and deleting clusters. Then I'll do a pr for the variable names in this project. Then we can see how they can be combined.

LamaAni / postgres-xl-helm

WAL Restore from buckets - discussion #52

What:

Why

Acceptance Criteria