CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.91k stars 587 forks source link

encyrpted db clone from s3 doesn't work #3374

Closed tirelibirefe closed 1 year ago

tirelibirefe commented 2 years ago

Hello, pgo 5.1 psql 14 eks 1.2

I followed the recipe here which is the topic "Clone From Backups Stored in S3 / GCS / Azure Blob Storage".

I had a psql cluster named "ferahevler" and its backups in S3. image

I deleted "feraehevler" and I would like restore/clone it from S3 to a new psql cluster named "seyrantepe".

here is the config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: seyrantepe
  namespace: psql-seyrantepe
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: seyrantepe-repl-tls
  customTLSSecret:
    name: seyrantepe-tls
  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: psql-seyrantepe-s3-creds
      - secret:
          name: seyrantepe-pgbackrest-secrets
      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      repo:
        name: repo1
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  users:
    - name: postgres
    - name: seyrantepe
      databases:
        - seyrantepe
...

When I run the new cluster installation, I get the following error message in restore pod:

$ k logs -n psql-seyrantepe seyrantepe-pgbackrest-restore-58g4w -f
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-09-07 08:49:38.975 GMT [18] LOG:  starting PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-4), 64-bit
2022-09-07 08:49:38.976 GMT [18] LOG:  listening on IPv6 address "::1", port 5432
2022-09-07 08:49:38.976 GMT [18] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-09-07 08:49:38.981 GMT [18] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-09-07 08:49:38.986 GMT [19] LOG:  database system was interrupted; last known up at 2022-09-07 08:19:21 GMT
2022-09-07 08:49:39.077 GMT [19] LOG:  restored log file "00000002.history" from archive
2022-09-07 08:49:39.142 GMT [19] LOG:  starting archive recovery
2022-09-07 08:49:39.226 GMT [19] LOG:  restored log file "00000002.history" from archive
2022-09-07 08:49:39.442 GMT [19] LOG:  restored log file "000000010000000000000015" from archive
2022-09-07 08:49:39.497 GMT [19] LOG:  redo starts at 0/15000028
2022-09-07 08:49:39.652 GMT [19] LOG:  restored log file "000000010000000000000016" from archive
2022-09-07 08:49:39.704 GMT [19] LOG:  consistent recovery state reached at 0/16000088
2022-09-07 08:49:39.705 GMT [18] LOG:  database system is ready to accept read-only connections
2022-09-07 08:49:39.889 GMT [19] LOG:  restored log file "000000010000000000000017" from archive
2022-09-07 08:49:40.122 GMT [19] LOG:  restored log file "000000010000000000000018" from archive
2022-09-07 08:49:40.346 GMT [19] LOG:  restored log file "000000010000000000000019" from archive
2022-09-07 08:49:40.561 GMT [19] LOG:  restored log file "00000001000000000000001A" from archive
2022-09-07 08:49:40.775 GMT [19] LOG:  restored log file "00000001000000000000001B" from archive
2022-09-07 08:49:41.031 GMT [19] LOG:  restored log file "00000001000000000000001C" from archive
2022-09-07 08:49:41.270 GMT [19] LOG:  restored log file "00000002000000000000001D" from archive
2022-09-07 08:49:41.570 GMT [19] LOG:  restored log file "00000002000000000000001E" from archive
2022-09-07 08:49:41.813 GMT [19] LOG:  redo done at 0/1E142BD8 system usage: CPU: user: 0.00 s, system: 0.05 s, elapsed: 2.31 s
2022-09-07 08:49:41.813 GMT [19] LOG:  last completed transaction was at log time 2022-09-07 08:47:09.187764+00
2022-09-07 08:49:42.042 GMT [19] LOG:  restored log file "00000002000000000000001E" from archive
2022-09-07 08:49:42.187 GMT [19] LOG:  selected new timeline ID: 3
2022-09-07 08:49:42.281 GMT [19] LOG:  archive recovery complete
2022-09-07 08:49:42.356 GMT [19] LOG:  restored log file "00000002.history" from archive
2022-09-07 08:49:42.410 GMT [18] LOG:  database system is ready to accept connections
2022-09-07 08:49:42.414 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:42.414 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:43.419 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:43.419 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:43.883 GMT [18] LOG:  received fast shutdown request
2022-09-07 08:49:43.886 GMT [18] LOG:  aborting any active transactions
2022-09-07 08:49:43.889 GMT [18] LOG:  background worker "logical replication launcher" (PID 55) exited with exit code 1
2022-09-07 08:49:43.890 GMT [24] LOG:  shutting down
2022-09-07 08:49:43.982 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:43.982 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:43.982 GMT [54] WARNING:  archiving write-ahead log file "00000003.history" failed too many times, will try again later
2022-09-07 08:49:43.985 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:43.985 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:44.989 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:44.989 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:45.996 GMT [54] LOG:  archive command failed with exit code 1
2022-09-07 08:49:45.996 GMT [54] DETAIL:  The failed archive command was: false
2022-09-07 08:49:45.997 GMT [54] WARNING:  archiving write-ahead log file "00000003.history" failed too many times, will try again later
2022-09-07 08:49:46.003 GMT [18] LOG:  database system is shut down

Restore cannot be completed successfully. Is there a workaround you can advise?

Thanks & Regards

benjaminjb commented 2 years ago

I was able to complete a restore from an encrypted bucket in s3. Two questions:

(1) After the restore pod finished, did you check the new postgres pod to see what was there? For comparison, here's the logs of my recent restore job:

+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-10-12 21:16:14.310 GMT [18] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-10-12 21:16:14.310 GMT [18] LOG:  listening on IPv6 address "::1", port 5432
2022-10-12 21:16:14.310 GMT [18] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-10-12 21:16:14.315 GMT [18] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-10-12 21:16:14.320 GMT [19] LOG:  database system was interrupted; last known up at 2022-10-12 21:06:08 GMT
2022-10-12 21:16:14.833 GMT [19] LOG:  starting archive recovery
2022-10-12 21:16:15.608 GMT [19] LOG:  restored log file "000000010000000000000007" from archive
2022-10-12 21:16:15.652 GMT [19] LOG:  redo starts at 0/7000028
2022-10-12 21:16:16.257 GMT [19] LOG:  restored log file "000000010000000000000008" from archive
2022-10-12 21:16:16.275 GMT [19] LOG:  consistent recovery state reached at 0/8000050
2022-10-12 21:16:16.275 GMT [18] LOG:  database system is ready to accept read-only connections
2022-10-12 21:16:16.723 GMT [19] LOG:  restored log file "000000010000000000000009" from archive
2022-10-12 21:16:17.051 GMT [19] LOG:  redo done at 0/9000148 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 1.39 s
2022-10-12 21:16:17.051 GMT [19] LOG:  last completed transaction was at log time 2022-10-12 21:06:17.63812+00
2022-10-12 21:16:17.509 GMT [19] LOG:  restored log file "000000010000000000000009" from archive
2022-10-12 21:16:17.859 GMT [19] LOG:  selected new timeline ID: 2
2022-10-12 21:16:17.897 GMT [19] LOG:  archive recovery complete
2022-10-12 21:16:18.203 GMT [18] LOG:  database system is ready to accept connections
2022-10-12 21:16:18.206 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:18.206 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:19.212 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:19.212 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:19.350 GMT [18] LOG:  received fast shutdown request
2022-10-12 21:16:19.353 GMT [18] LOG:  aborting any active transactions
2022-10-12 21:16:19.355 GMT [18] LOG:  background worker "logical replication launcher" (PID 42) exited with exit code 1
2022-10-12 21:16:19.358 GMT [22] LOG:  shutting down
2022-10-12 21:16:19.402 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:19.402 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:19.403 GMT [41] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later
2022-10-12 21:16:19.405 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:19.405 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:20.412 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:20.412 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:21.382 GMT [41] LOG:  archive command failed with exit code 1
2022-10-12 21:16:21.382 GMT [41] DETAIL:  The failed archive command was: false
2022-10-12 21:16:21.382 GMT [41] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later
2022-10-12 21:16:21.387 GMT [18] LOG:  database system is shut down

Looks pretty similar, and when I go to check the postgres pod, the restore has completed successfully. So: did the postgres pod come up successfully with the data that you expected?

(2) Why does your configuration have two secrets and what do they look like? (I mean, minus the actual secret data -- just what are the fields/volumes getting mounted to the restore pod?

benjaminjb commented 2 years ago

Also, just to be clear, are you using the pgbackrest encryption as discussed here: https://access.crunchydata.com/documentation/postgres-operator/5.1.1/tutorial/backups/#encryption

tirelibirefe commented 2 years ago

Hello @benjaminjb, This is a very important topic and I'll give more information about the topic, pls don't close it for a while.

Thanks & Regards

tirelibirefe commented 1 year ago

Also, just to be clear, are you using the pgbackrest encryption as discussed here: https://access.crunchydata.com/documentation/postgres-operator/5.1.1/tutorial/backups/#encryption

Hello @benjaminjb Pls excuse me for the latency. As I cleaned up the environment, I had to reinstall stuff, to reproduce the issue; dbnames may have been changed but I use the same manifests.

##########

All my backups are encrypted, all my backups are written to S3.

###########

...and my answer is yes. I used the receipt there.

$ k get secrets -n psql-ferahevler
NAME                                TYPE                                  DATA   AGE
default-token-n82m6                 kubernetes.io/service-account-token   3      44m
ferahevler-instance-token-64hbg     kubernetes.io/service-account-token   3      40m
ferahevler-monitoring               Opaque                                2      40m
ferahevler-pgbackrest-secrets       Opaque                                1      40m
ferahevler-pgbackrest-token-xhth8   kubernetes.io/service-account-token   3      40m
ferahevler-pgbouncer                Opaque                                6      40m
ferahevler-pgha1-bgqk-certs         Opaque                                4      40m
ferahevler-pgha1-xgxz-certs         Opaque                                4      40m
ferahevler-pguser-ferahevler        Opaque                                12     40m
ferahevler-pguser-postgres          Opaque                                7      40m
ferahevler-repl-tls                 kubernetes.io/tls                     3      40m
ferahevler-tls                      kubernetes.io/tls                     3      40m
pgo-root-cacert                     Opaque                                2      40m
psql-ferahevler-s3-creds            Opaque                                1      40m

and here is the full config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: ferahevler
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: ferahevler-repl-tls
  customTLSSecret:
    name: ferahevler-tls
  users:
    - name: postgres
    - name: ferahevler
      databases:
        - ferahevler
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/ferahevler
        repo1-retention-full: "30"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: mybucketnamehere
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/role: pgbouncer
tirelibirefe commented 1 year ago

Scenario-1: psql-restore-ferahevler-enc-v1

I deleted the psql cluster (ferahevler) above and try to restore to a new cluster (different name cluster). New clustername is seyrantepe.

here my backups:

image

All my backups are encrypted, all my backups are written to S3.

I'll try to restore from a specific backup: (20221024-105247F)

here is the config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: seyrantepe
  namespace: psql-seyrantepe
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: seyrantepe-repl-tls
  customTLSSecret:
    name: seyrantepe-tls
  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: psql-seyrantepe-s3-creds
      - secret:
          name: seyrantepe-pgbackrest-secrets
      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      repo:
        name: repo1
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  users:
    - name: postgres
    - name: seyrantepe
      databases:
        - seyrantepe
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: seyrantepe
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-seyrantepe-s3-creds
      - secret:
          name: seyrantepe-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/ferahevler
        repo1-retention-full: "30"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: seyrantepe
                  postgres-operator.crunchydata.com/role: pgbouncer

...and logs:

$ k logs -n psql-seyrantepe seyrantepe-pgbackrest-restore-zq7c4 -f
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-10-24 17:33:13.540 GMT [19] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-10-24 17:33:13.541 GMT [19] LOG:  listening on IPv6 address "::1", port 5432
2022-10-24 17:33:13.541 GMT [19] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-10-24 17:33:13.548 GMT [19] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-10-24 17:33:13.558 GMT [20] LOG:  database system was interrupted; last known up at 2022-10-24 15:00:10 GMT
2022-10-24 17:33:13.717 GMT [20] LOG:  starting archive recovery
2022-10-24 17:33:13.918 GMT [20] LOG:  restored log file "000000010000000000000045" from archive
2022-10-24 17:33:13.987 GMT [20] LOG:  redo starts at 0/45000028
2022-10-24 17:33:14.153 GMT [20] LOG:  restored log file "000000010000000000000046" from archive
2022-10-24 17:33:14.254 GMT [20] LOG:  consistent recovery state reached at 0/46000088
2022-10-24 17:33:14.255 GMT [19] LOG:  database system is ready to accept read-only connections
2022-10-24 17:33:14.466 GMT [20] LOG:  restored log file "000000010000000000000047" from archive
2022-10-24 17:33:14.722 GMT [20] LOG:  restored log file "000000010000000000000048" from archive
2022-10-24 17:33:14.990 GMT [20] LOG:  restored log file "000000010000000000000049" from archive
2022-10-24 17:33:15.252 GMT [20] LOG:  restored log file "00000001000000000000004A" from archive
2022-10-24 17:33:15.765 GMT [20] LOG:  restored log file "00000001000000000000004B" from archive
2022-10-24 17:33:20.419 GMT [20] LOG:  restored log file "00000001000000000000004C" from archive
2022-10-24 17:33:20.928 GMT [20] LOG:  restored log file "00000001000000000000004D" from archive
2022-10-24 17:33:21.155 GMT [20] LOG:  restored log file "00000001000000000000004E" from archive
2022-10-24 17:33:21.475 GMT [20] LOG:  restored log file "00000001000000000000004F" from archive
2022-10-24 17:33:21.899 GMT [20] LOG:  restored log file "000000010000000000000050" from archive
2022-10-24 17:33:22.183 GMT [20] LOG:  restored log file "000000010000000000000051" from archive
2022-10-24 17:33:22.381 GMT [20] LOG:  restored log file "000000010000000000000052" from archive
2022-10-24 17:33:22.644 GMT [20] LOG:  restored log file "000000010000000000000053" from archive
2022-10-24 17:33:22.904 GMT [20] LOG:  restored log file "000000010000000000000054" from archive
2022-10-24 17:33:23.190 GMT [20] LOG:  restored log file "000000010000000000000055" from archive
2022-10-24 17:33:23.492 GMT [20] LOG:  restored log file "000000010000000000000056" from archive
2022-10-24 17:33:23.827 GMT [20] LOG:  restored log file "000000010000000000000057" from archive
2022-10-24 17:33:25.716 GMT [20] LOG:  restored log file "000000010000000000000058" from archive
2022-10-24 17:33:29.021 GMT [20] LOG:  restored log file "000000010000000000000059" from archive
2022-10-24 17:33:30.943 GMT [20] LOG:  restored log file "00000001000000000000005A" from archive
2022-10-24 17:33:31.244 GMT [20] LOG:  restored log file "00000001000000000000005B" from archive
2022-10-24 17:33:31.611 GMT [20] LOG:  restored log file "00000001000000000000005C" from archive
2022-10-24 17:33:32.605 GMT [20] LOG:  restored log file "00000001000000000000005D" from archive
2022-10-24 17:33:32.818 GMT [20] LOG:  restored log file "00000001000000000000005E" from archive
2022-10-24 17:33:33.299 GMT [20] LOG:  restored log file "00000001000000000000005F" from archive
2022-10-24 17:33:33.679 GMT [20] LOG:  restored log file "000000010000000000000060" from archive
2022-10-24 17:33:34.011 GMT [20] LOG:  restored log file "000000010000000000000061" from archive
2022-10-24 17:33:34.376 GMT [20] LOG:  restored log file "000000010000000000000062" from archive
2022-10-24 17:33:34.645 GMT [20] LOG:  restored log file "000000010000000000000063" from archive
2022-10-24 17:33:34.898 GMT [20] LOG:  restored log file "000000010000000000000064" from archive
2022-10-24 17:33:35.165 GMT [20] LOG:  restored log file "000000010000000000000065" from archive
2022-10-24 17:33:35.449 GMT [20] LOG:  restored log file "000000010000000000000066" from archive
2022-10-24 17:33:35.718 GMT [20] LOG:  restored log file "000000010000000000000067" from archive
2022-10-24 17:33:35.897 GMT [20] LOG:  redo done at 0/67000110 system usage: CPU: user: 0.00 s, system: 0.11 s, elapsed: 21.91 s
2022-10-24 17:33:35.897 GMT [20] LOG:  last completed transaction was at log time 2022-10-24 17:23:30.336398+00
2022-10-24 17:33:36.389 GMT [20] LOG:  restored log file "000000010000000000000067" from archive
2022-10-24 17:33:38.933 GMT [20] LOG:  selected new timeline ID: 2
2022-10-24 17:33:41.553 GMT [20] LOG:  archive recovery complete
2022-10-24 17:33:41.895 GMT [19] LOG:  database system is ready to accept connections
2022-10-24 17:33:41.899 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:41.899 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:42.904 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:42.904 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:43.593 GMT [19] LOG:  received fast shutdown request
2022-10-24 17:33:43.596 GMT [19] LOG:  aborting any active transactions
2022-10-24 17:33:43.608 GMT [19] LOG:  background worker "logical replication launcher" (PID 175) exited with exit code 1
2022-10-24 17:33:43.610 GMT [24] LOG:  shutting down
2022-10-24 17:33:43.908 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:43.908 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:43.908 GMT [174] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later
2022-10-24 17:33:46.139 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:46.139 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:46.211 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:46.211 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:47.216 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:47.216 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:47.216 GMT [174] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later
2022-10-24 17:33:47.219 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:47.219 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:48.223 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:48.223 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:49.226 GMT [174] LOG:  archive command failed with exit code 1
2022-10-24 17:33:49.226 GMT [174] DETAIL:  The failed archive command was: false
2022-10-24 17:33:49.226 GMT [174] WARNING:  archiving write-ahead log file "00000002.history" failed too many times, will try again later
2022-10-24 17:33:49.230 GMT [19] LOG:  database system is shut down

I check the db, nothing was restored.

to be continued...

tirelibirefe commented 1 year ago

Scenario-2: Restore from the last backup by using the same name:

here is the config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: ferahevler
  namespace: psql-ferahevler
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: ferahevler-repl-tls
  customTLSSecret:
    name: ferahevler-tls
  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      repo:
        name: repo1
        s3:
          bucket: kmybucketnamehere1
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/ferahevler
        repo1-retention-full: "14"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/role: pgbouncer

...and logs:

$ k logs -n psql-ferahevler ferahevler-pgbackrest-restore-xmhrb
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
Error from server (BadRequest): container "pgbackrest-restore" in pod "ferahevler-pgbackrest-restore-xmhrb" is waiting to start: PodInitializing
mkarakas@vm-mkarakas-01:~$ k logs -n psql-ferahevler ferahevler-pgbackrest-restore-xmhrb -f
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-10-24 17:51:35.964 GMT [19] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-10-24 17:51:35.966 GMT [19] LOG:  listening on IPv6 address "::1", port 5432
2022-10-24 17:51:35.966 GMT [19] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-10-24 17:51:35.976 GMT [19] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-10-24 17:51:35.983 GMT [20] LOG:  database system was interrupted; last known up at 2022-10-24 17:34:34 GMT
2022-10-24 17:51:36.171 GMT [20] LOG:  starting archive recovery
2022-10-24 17:51:36.285 GMT [20] LOG:  restored log file "00000002.history" from archive
2022-10-24 17:51:36.613 GMT [20] LOG:  restored log file "00000002000000000000006A" from archive
2022-10-24 17:51:36.672 GMT [20] LOG:  redo starts at 0/6A000060
2022-10-24 17:51:36.883 GMT [20] LOG:  restored log file "00000002000000000000006B" from archive
2022-10-24 17:51:37.098 GMT [20] LOG:  restored log file "00000002000000000000006C" from archive
2022-10-24 17:51:37.165 GMT [20] LOG:  consistent recovery state reached at 0/6C000088
2022-10-24 17:51:37.166 GMT [19] LOG:  database system is ready to accept read-only connections
2022-10-24 17:51:37.429 GMT [20] LOG:  restored log file "00000002000000000000006D" from archive
2022-10-24 17:51:37.649 GMT [20] LOG:  restored log file "00000002000000000000006E" from archive
2022-10-24 17:51:37.976 GMT [20] LOG:  restored log file "00000002000000000000006F" from archive
2022-10-24 17:51:38.086 GMT [20] LOG:  redo done at 0/6F000110 system usage: CPU: user: 0.00 s, system: 0.01 s, elapsed: 1.41 s
2022-10-24 17:51:38.086 GMT [20] LOG:  last completed transaction was at log time 2022-10-24 17:35:43.740853+00
2022-10-24 17:51:38.231 GMT [20] LOG:  restored log file "00000002000000000000006F" from archive
2022-10-24 17:51:38.326 GMT [20] LOG:  selected new timeline ID: 3
2022-10-24 17:51:38.409 GMT [20] LOG:  archive recovery complete
2022-10-24 17:51:38.673 GMT [20] LOG:  restored log file "00000002.history" from archive
2022-10-24 17:51:38.731 GMT [19] LOG:  database system is ready to accept connections
2022-10-24 17:51:38.735 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:38.735 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:39.739 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:39.739 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:40.451 GMT [19] LOG:  received fast shutdown request
2022-10-24 17:51:40.455 GMT [19] LOG:  aborting any active transactions
2022-10-24 17:51:40.457 GMT [19] LOG:  background worker "logical replication launcher" (PID 47) exited with exit code 1
2022-10-24 17:51:40.458 GMT [24] LOG:  shutting down
2022-10-24 17:51:40.533 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:40.533 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:40.533 GMT [46] WARNING:  archiving write-ahead log file "00000003.history" failed too many times, will try again later
2022-10-24 17:51:40.538 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:40.538 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:41.544 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:41.544 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:42.549 GMT [46] LOG:  archive command failed with exit code 1
2022-10-24 17:51:42.549 GMT [46] DETAIL:  The failed archive command was: false
2022-10-24 17:51:42.549 GMT [46] WARNING:  archiving write-ahead log file "00000003.history" failed too many times, will try again later
2022-10-24 17:51:42.552 GMT [19] LOG:  database system is shut down

I check the db, I see that all data in the last backup was restored. (?!)

tirelibirefe commented 1 year ago

Scenario-3: different servername, specific backup:

config: psql-restore-pitr kustomize

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: gultepe
  namespace: psql-gultepe
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: gultepe-repl-tls
  customTLSSecret:
    name: gultepe-tls
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: gultepe
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-gultepe-s3-creds
      manual:
        repoName: repo1
        options:
         - --type=full
      restore:
        enabled: true
        repoName: repo1
        options:
        - --set=20221024-105409F
        - --type=immediate
      global:
        repo1-path: /repo1/gultepe
        repo1-retention-full: "14"
        repo1-retention-full-type: time
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: mybucketnamehere
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: gultepe
                  postgres-operator.crunchydata.com/role: pgbouncer

There is no restore.

tirelibirefe commented 1 year ago

Scenario-4: psql-restore-master-enc-v2

Trying restore from a specific backup: (same name)

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: ferahevler
  namespace: psql-ferahevler
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: ferahevler-repl-tls
  customTLSSecret:
    name: ferahevler-tls
  dataSource:
    postgresCluster:
      clusterName: ferahevler
      repoName: repo1
      options:
      - --set=20221024-105409F
      - --type=immediate
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/ferahevler
        repo1-retention-full: "14"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/role: pgbouncer

logs:

$ k logs -n psql-ferahevler ferahevler-pgbackrest-restore-r56m2  -f
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --set=20221024-105409F --type=immediate --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-10-24 18:24:29.989 GMT [18] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-10-24 18:24:29.990 GMT [18] LOG:  listening on IPv6 address "::1", port 5432
2022-10-24 18:24:29.990 GMT [18] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-10-24 18:24:29.997 GMT [18] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-10-24 18:24:30.004 GMT [19] LOG:  database system was interrupted; last known up at 2022-10-24 10:54:13 GMT
2022-10-24 18:24:30.018 GMT [19] LOG:  starting point-in-time recovery to earliest consistent point
2022-10-24 18:24:30.296 GMT [19] LOG:  restored log file "00000001000000000000000E" from archive
2022-10-24 18:24:30.364 GMT [19] LOG:  redo starts at 0/E000028
2022-10-24 18:24:30.368 GMT [19] LOG:  consistent recovery state reached at 0/E000138
2022-10-24 18:24:30.368 GMT [18] LOG:  database system is ready to accept read-only connections
2022-10-24 18:24:30.368 GMT [19] LOG:  recovery stopping after reaching consistency
2022-10-24 18:24:30.368 GMT [19] LOG:  pausing at the end of recovery
2022-10-24 18:24:30.368 GMT [19] HINT:  Execute pg_wal_replay_resume() to promote.
2022-10-24 18:24:30.488 GMT [19] LOG:  redo done at 0/E000138 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.12 s
2022-10-24 18:24:30.646 GMT [19] LOG:  restored log file "00000002.history" from archive
2022-10-24 18:24:30.819 GMT [19] LOG:  restored log file "00000003.history" from archive
2022-10-24 18:24:30.995 GMT [19] LOG:  restored log file "00000004.history" from archive
2022-10-24 18:24:31.138 GMT [19] LOG:  selected new timeline ID: 5
2022-10-24 18:24:31.217 GMT [19] LOG:  archive recovery complete
2022-10-24 18:24:31.437 GMT [18] LOG:  database system is ready to accept connections
2022-10-24 18:24:31.451 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:31.451 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:32.457 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:32.457 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:32.541 GMT [18] LOG:  received fast shutdown request
2022-10-24 18:24:32.544 GMT [18] LOG:  aborting any active transactions
2022-10-24 18:24:32.549 GMT [18] LOG:  background worker "logical replication launcher" (PID 36) exited with exit code 1
2022-10-24 18:24:32.551 GMT [21] LOG:  shutting down
2022-10-24 18:24:32.656 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:32.656 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:32.656 GMT [35] WARNING:  archiving write-ahead log file "00000005.history" failed too many times, will try again later
2022-10-24 18:24:32.660 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:32.660 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:33.667 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:33.667 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:34.672 GMT [35] LOG:  archive command failed with exit code 1
2022-10-24 18:24:34.672 GMT [35] DETAIL:  The failed archive command was: false
2022-10-24 18:24:34.672 GMT [35] WARNING:  archiving write-ahead log file "00000005.history" failed too many times, will try again later
2022-10-24 18:24:34.676 GMT [18] LOG:  database system is shut down

I check the db and restore is successful. (?!)

tirelibirefe commented 1 year ago

Scenario-5: That one is the same scenario with Scenario-4 but different clustername.

config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: seyrantepe
  namespace: psql-seyrantepe
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: seyrantepe-repl-tls
  customTLSSecret:
    name: seyrantepe-tls
  dataSource:
    postgresCluster:
      clusterName: ferahevler
      repoName: repo1
      options:
      - --set=20221024-105409F
      - --type=immediate
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: seyrantepe
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-seyrantepe-s3-creds
      - secret:
          name: seyrantepe-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/seyrantepe
        repo1-retention-full: "14"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: mybucketnamehere
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: seyrantepe
                  postgres-operator.crunchydata.com/role: pgbouncer

logs:

$ k get events -n psql-seyrantepe
LAST SEEN   TYPE      REASON              OBJECT                                            MESSAGE
39s         Normal    cert-manager.io     certificaterequest/seyrantepe-primary-ht8br       Certificate request has been approved by cert-manager.io
39s         Normal    CertificateIssued   certificaterequest/seyrantepe-primary-ht8br       Certificate fetched from issuer successfully
39s         Normal    Issuing             certificate/seyrantepe-primary                    Issuing certificate as Secret does not exist
39s         Normal    Generated           certificate/seyrantepe-primary                    Stored new private key in temporary Secret resource "seyrantepe-primary-rp6dz"
39s         Normal    Requested           certificate/seyrantepe-primary                    Created new CertificateRequest resource "seyrantepe-primary-ht8br"
39s         Normal    Issuing             certificate/seyrantepe-primary                    The certificate has been successfully issued
39s         Normal    cert-manager.io     certificaterequest/seyrantepe-replication-6fhjk   Certificate request has been approved by cert-manager.io
39s         Normal    CertificateIssued   certificaterequest/seyrantepe-replication-6fhjk   Certificate fetched from issuer successfully
39s         Normal    Issuing             certificate/seyrantepe-replication                Issuing certificate as Secret does not exist
39s         Normal    Generated           certificate/seyrantepe-replication                Stored new private key in temporary Secret resource "seyrantepe-replication-gljqn"
39s         Normal    Requested           certificate/seyrantepe-replication                Created new CertificateRequest resource "seyrantepe-replication-6fhjk"
39s         Normal    Issuing             certificate/seyrantepe-replication                The certificate has been successfully issued
38s         Warning   InvalidDataSource   postgrescluster/seyrantepe                        PostgresCluster "ferahevler" does not exist
tirelibirefe commented 1 year ago

I am totaly lost :( .

tirelibirefe commented 1 year ago

Scenario-6: This is the same scenario with the Scenario-1 but this time I try to restore by using the same name:

config:

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: ferahevler
  namespace: psql-ferahevler
spec:
  image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.5-1
  postgresVersion: 14
  customReplicationTLSSecret:
    name: ferahevler-repl-tls
  customTLSSecret:
    name: ferahevler-tls
  dataSource:
    pgbackrest:
      stanza: db
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      repo:
        name: repo1
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  users:
    - name: postgres
    - name: ferahevler
      databases:
        - ferahevler
  instances:
    - name: pgha1
      replicas: 2
      dataVolumeClaimSpec:
        storageClassName: ebs-sc-enc
        accessModes:
        - "ReadWriteOnce"
        resources:
          requests:
            storage: 1Gi
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/instance-set: pgha1
  monitoring:
    pgmonitor:
      exporter:
        image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres-exporter:ubi8-5.2.0-0
  backups:
    pgbackrest:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.40-1
      configuration:
      - secret:
          name: psql-ferahevler-s3-creds
      - secret:
          name: ferahevler-pgbackrest-secrets
      manual:
        repoName: repo1
        options:
         - --type=full
      global:
        repo1-path: /repo1/ferahevler
        repo1-retention-full: "30"
        repo1-retention-full-type: time
        repo1-cipher-type: aes-256-cbc
      repos:
      - name: repo1
        schedules:
          full: "0 */5 * * *"
          # incremental: "0 */6 * * *"
        s3:
          bucket: mybucketnamehere
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1
  proxy:
    pgBouncer:
      image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbouncer:ubi8-1.17-1
      replicas: 2
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  postgres-operator.crunchydata.com/cluster: ferahevler
                  postgres-operator.crunchydata.com/role: pgbouncer

logs:

$ k logs -n psql-ferahevler ferahevler-pgbackrest-restore-bgt4r -f
Defaulted container "pgbackrest-restore" out of: pgbackrest-restore, nss-wrapper-init (init)
+ pgbackrest restore --stanza=db --pg1-path=/pgdata/pg14 --repo=1 --delta --link-map=pg_wal=/pgdata/pg14_wal
WARN: unable to open log file '/pgdata/pgbackrest/log/db-restore.log': No such file or directory
      NOTE: process will continue without log file.
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg14' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-10-24 19:12:31.474 GMT [19] LOG:  starting PostgreSQL 14.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-10), 64-bit
2022-10-24 19:12:31.480 GMT [19] LOG:  listening on IPv6 address "::1", port 5432
2022-10-24 19:12:31.480 GMT [19] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2022-10-24 19:12:31.486 GMT [19] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-10-24 19:12:31.517 GMT [20] LOG:  database system was interrupted; last known up at 2022-10-24 19:07:45 GMT
2022-10-24 19:12:31.705 GMT [20] LOG:  starting archive recovery
2022-10-24 19:12:31.797 GMT [20] LOG:  restored log file "00000006.history" from archive
2022-10-24 19:12:32.010 GMT [20] LOG:  restored log file "000000060000000000000017" from archive
2022-10-24 19:12:32.059 GMT [20] LOG:  redo starts at 0/17000028
2022-10-24 19:12:32.209 GMT [20] LOG:  restored log file "000000060000000000000018" from archive
2022-10-24 19:12:32.409 GMT [20] LOG:  restored log file "000000060000000000000019" from archive
2022-10-24 19:12:32.461 GMT [20] LOG:  consistent recovery state reached at 0/19000050
2022-10-24 19:12:32.462 GMT [19] LOG:  database system is ready to accept read-only connections
2022-10-24 19:12:32.544 GMT [20] LOG:  redo done at 0/19000050 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.48 s
2022-10-24 19:12:32.544 GMT [20] LOG:  last completed transaction was at log time 2022-10-24 19:07:30.807409+00
2022-10-24 19:12:32.741 GMT [20] LOG:  restored log file "000000060000000000000019" from archive
2022-10-24 19:12:32.836 GMT [20] LOG:  selected new timeline ID: 7
2022-10-24 19:12:32.890 GMT [20] LOG:  archive recovery complete
2022-10-24 19:12:33.001 GMT [20] LOG:  restored log file "00000006.history" from archive
2022-10-24 19:12:33.062 GMT [19] LOG:  database system is ready to accept connections
2022-10-24 19:12:33.069 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:33.069 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:34.074 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:34.074 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:34.603 GMT [19] LOG:  received fast shutdown request
2022-10-24 19:12:34.605 GMT [19] LOG:  aborting any active transactions
2022-10-24 19:12:34.608 GMT [19] LOG:  background worker "logical replication launcher" (PID 40) exited with exit code 1
2022-10-24 19:12:34.609 GMT [24] LOG:  shutting down
2022-10-24 19:12:34.663 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:34.663 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:34.663 GMT [39] WARNING:  archiving write-ahead log file "00000007.history" failed too many times, will try again later
2022-10-24 19:12:34.667 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:34.667 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:35.672 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:35.672 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:36.691 GMT [39] LOG:  archive command failed with exit code 1
2022-10-24 19:12:36.691 GMT [39] DETAIL:  The failed archive command was: false
2022-10-24 19:12:36.692 GMT [39] WARNING:  archiving write-ahead log file "00000007.history" failed too many times, will try again later
2022-10-24 19:12:36.710 GMT [19] LOG:  database system is shut down

I check the db and restore is successful. The data is there.

Whilst it cannot be restored different, it can be restored if I use the same name.

tony-landreth commented 1 year ago

Hello @tirelibirefe !

It looks like in some scenarios, you're attempting to clone from backup data and in others, you're trying to clone from an existing cluster. Different things can go wrong, depending on the scenario.

I'd like to focus on Scenario 1 to understand what's going wrong. I've noticed that the section for specifying your data source spec.dataSource.pgbackrest is configured in the same manner as the section for backing up data in the new cluster spec.backups.pgbackrest.repos.

Under your data source:

      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      repo:
        name: repo1
        s3:
          bucket: ku-eksdev1-crunchydata-backups
          endpoint: s3.eu-central-1.amazonaws.com:443
          region: eu-central-1

Under the new cluster's backups section:

      global:
        repo1-path: /repo1/ferahevler
        repo1-cipher-type: aes-256-cbc
      ...     
      repos:
          ...
          s3:
            bucket: ku-eksdev1-crunchydata-backups
            endpoint: s3.eu-central-1.amazonaws.com:443
            region: eu-central-1

It would be better to backup the new cluster, seyrantepe, to a different path (like /repo1/seyrantepe) to make sure the data from ferahevler isn't accidentally overwritten.

As you've probably noticed, the WARNING and failed archive messages do not indicate a failure to restore. Your logs indicate that archives are being restored and that the job is succeeding. When you've experienced a failure, what commands have you used to see that your data hasn't been restored? And what is the output from those commands?