CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.94k stars 592 forks source link

Is there any way to disable archiving? #3432

Closed mzwettler2 closed 2 weeks ago

mzwettler2 commented 2 years ago

I found PGO making archiving mandatory:

        // Send WAL files to all configured repositories when not in recovery.
        // - https://pgbackrest.org/user-guide.html#quickstart/configure-archiving
        // - https://pgbackrest.org/command.html#command-archive-push
        // - https://www.postgresql.org/docs/current/runtime-config-wal.html
        archive := `pgbackrest --stanza=` + DefaultStanzaName + ` archive-push "%p"`
        outParameters.Mandatory.Add("archive_mode", "on")
        outParameters.Mandatory.Add("archive_command", archive)

This also requires having a repo retention and at least on scheduled backup to get automatic retention management working.

There might be cases where I do not want archiving, eg. loading a large amount of data. There might be even cases where I do not want archiving nor backups, eg. some kind of temporary test systems.

QUESTION: Is there any way to disable archiving?

Environment

Please provide the following details:

laurivosandi commented 2 years ago

I also find it quite odd that I have to enable backups

aidensgithub commented 2 years ago

Any solutions found?

laurivosandi commented 2 years ago

I mangled the CRD definition and set minIndex: 0 in the relevant section :D

mzwettler2 commented 2 years ago

@laurivosandi

What did you do exactly? Hacking the PGO sources?

mzwettler2 commented 1 year ago

@cbandy

Maybe this would be an enhancement request?

laurivosandi commented 1 year ago

JFYI: After hacking the CRD-s looks like WAL keeps growing forever :D

So yeah authors should address this issue themselves

benjaminjb commented 1 year ago

Just dropping a line here to note that we have a ticket in our backlog for disabling backups/archiving (for dev envs only, please!)

(This is a feature that has been requested in issues a few times, see also https://github.com/CrunchyData/postgres-operator/issues/2531 https://github.com/CrunchyData/postgres-operator/issues/3152 I'm adding these links to help consolidate this issue in case I close any of these tickets.)

Pluggi commented 1 year ago

What is the timeline for this feature? Do you have an estimate on when this will be available?

sigwinch28 commented 1 year ago

Any update on this?

bendilley commented 1 year ago

I've found a way to do this temporarily:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart

It's very useful when my initial datasource is a big pg_dump file.

After turning archive_mode back on, kicking-off a full backup seems like a good idea:

kubectl create job --from=cronjob/hippo-repo1-full backup-post-import
laurivosandi commented 1 year ago

Just move on to https://cloudnative-pg.io/

mzwettler2 commented 1 year ago

@bendilley

I also tried this but it did not work for me. PGO quickly noticed the change and restored the original state with archive_mode='on'. Maybe this might be deferred with the new "pause" flag of PGO 5.2.

Anyway, it is not useful to trick this outside of PGO or Kubernetes.

bendilley commented 1 year ago

@mzwettler2 this was with PGO 5.3.0 but I didn't explicitly pause anything.

I do think it's surprisingly difficult to initialise a cluster from a non-PGO datasource. I've even tried restoring from a pgbackrest backup in S3, but because the original source wasn't PGO-managed I still couldn't get it to work as a dataSource. pg_restore does work, but a large dataset will fill the disk with transaction logs way faster than they can be streamed by pgbackrest, at least to S3.

@laurivosandi thanks for the link - I'm open to alternatives!

bendilley commented 1 year ago

Now that I think about it, I did also merge the below patch of the patroni dynamicConfiguration in an earlier experiment - it didn't appear to have any effect, but it was still in place when I manually turned off archive_mode, which might explain why I got away with it 🤷🏻

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'
mzwettler2 commented 1 year ago

@bendilley

At the top of this thread you see that "archive_mode: on" is hardcoded within PGO. It was probably just a coincidence for you.

bendilley commented 1 year ago

Perhaps I should ask this in a different thread, but how are you supposed to initialise a PGO database from a non-PGO datasource??

mzwettler2 commented 1 year ago

@bendilley

Yes, somewhat off topic here. Anyway, I would go with pg_dump/pg_restore. Seems to be the cleanest way for me.

bendilley commented 1 year ago

pg_restore works fine for smaller datasets, but migrating a large database will fill up the disk with WAL faster than it can be backed-up, at least that's my experience.

mzwettler2 commented 1 year ago

@bendilley

Yep. The only "stable" dirty hack I found to prevent for a filesystem full error has been soft-linking a subdirectory of the "/pgbackrest/repo1/archive/db" on the repo pod to "/dev/null". PostgreSQL generates archives and saves them to trash. PGO does not recognize nor undo this filesystem "change".

johannes-engler-mw commented 1 year ago

Any news regarding this issue?

mzwettler2 commented 4 months ago

@benjaminjb

We are hopeing and praying for this feature. When will it come?

bendilley commented 4 months ago

Further to my earlier observation, I've found this combination of steps to be consistently effective:

kubectl patch postgrescluster hippo --type merge --patch-file patch-patroni-archive-mode.yml
kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'off'
pg_ctl restart
exit

where patch-patroni-archive-mode.yml is

spec:
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          archive_mode: 'off'

I've only ever done it while pg_restoreing a dump file. Presumably the setting wouldn't persist if the container was restarted.

After the restore is complete, I just reverse the changes and kick-off a full backup:

kubectl exec hippo-data-ljc7-0 -c database -it -- bash
vi /pgdata/pg14/postgresql.conf # and set archive_mode = 'on'
pg_ctl restart
exit

kubectl patch postgrescluster hippo --type json -p '[{"op": "remove", "path": "/spec/patroni/dynamicConfiguration/postgresql/parameters/archive_mode"}]'
kubectl create job --from=cronjob/hippo-repo1-full backup-post-import
andrewlecuyer commented 4 months ago

Just dropping a quick update on this thread to note that we are currently taking a look at some upcoming plans and changes to CPK to allow allow additional flexibility around disabling archiving (as requested in this issue).

Therefore, I simply wanted to note that this issue is definitely on our radar, and we will be in touch with additional information, details, etc. once available.

dsessler7 commented 2 weeks ago

I'm pleased to announce that support for optional backups has been released with Crunchy Postgres for Kubernetes v5.7.0. Please see the docs for more info. If you get a chance to try it out, please let us know what you think!

mzwettler2 commented 2 weeks ago

@dsessler7

Thanks for the feature. I will try it out when I find enough time.

I quickly skimmed the documentation. I would have preferred a simple spec.backups.status switch with a three-way logic.

on: archives + backups (default) off: no archives + backups delete: no archives + backups; delete all existing backup artifacts