CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.91k stars 587 forks source link

Standby Cluster after In-Place PITR Sync Failure #3018

Closed djtaylor closed 2 years ago

djtaylor commented 2 years ago

Overview

This may be simply a lack in documentation for this particular scenario, as I can't seem to find anywhere how to handle this situation. We have two clusters in different datacenters, with data replicated to the standby by way of S3. After performing an in-place PITR on the master (which everything works), I want to make sure its state is reflect in the standby cluster. I delete and recreate the standby cluster to attempt to force it to sync with the current state of the primary after the in-place PITR was done. The standby cluster fails to properly bootstrap/initialize.

Environment

Steps to Reproduce

1.) Have an environment with 2 clusters, one primary and one standby. Use S3 to replicate from primary to standby 2.) Perform an in-place PITR recovery on the primary as described in: https://access.crunchydata.com/documentation/postgres-operator/5.0.4/tutorial/disaster-recovery/ 3.) After primary is healthy again, delete the standby and recreate from a blank state 4.) Observe the state of the pods failing to initialize

EXPECTED

  1. I expect the standby cluster to properly initialize, and have the same state as the primary cluster after the in-place PITR

ACTUAL

  1. The standby cluster fails to boot, and the database containers never become healthy

Logs

Extract from one of the standby database containers that is failing to initialize:

2022-02-04 19:57:14,202 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-02-04 19:57:14,207 INFO: Lock owner: None; I am harbor-db-harbor-db-7thp-0
2022-02-04 19:57:14,230 INFO: trying to bootstrap (without leader)
WARN: --delta or --force specified but unable to find 'PG_VERSION' or 'backup.manifest' in '/pgdata/pg13' to confirm that this is a valid $PGDATA directory.  --delta and --force have been disabled and if any files exist in the destination directories the restore will be aborted.
2022-02-04 19:57:24,707 INFO: Lock owner: None; I am harbor-db-harbor-db-7thp-0
2022-02-04 19:57:24,707 INFO: not healthy enough for leader race
2022-02-04 19:57:24,718 INFO: bootstrap (without leader) in progress
2022-02-04 19:57:26,644 INFO: replica has been created using pgbackrest
2022-02-04 19:57:26,645 INFO: bootstrapped (without leader)
2022-02-04 19:57:26.892 UTC [121] LOG:  pgaudit extension initialized
2022-02-04 19:57:26.907 UTC [121] LOG:  redirecting log output to logging collector process
2022-02-04 19:57:26.907 UTC [121] HINT:  Future log output will appear in directory "log".
2022-02-04 19:57:26,907 INFO: postmaster pid=121
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - no response
2022-02-04 19:57:27,959 WARNING: Postgresql is not running.
2022-02-04 19:57:27,959 INFO: Lock owner: None; I am harbor-db-harbor-db-7thp-0
2022-02-04 19:57:27,964 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7057319451731464255
  Database cluster state: in production
  pg_control last modified: Fri Feb  4 19:30:09 2022
  Latest checkpoint location: 0/CA000060
  Latest checkpoint's REDO location: 0/CA000028
  Latest checkpoint's REDO WAL file: 0000001800000000000000CA
  Latest checkpoint's TimeLineID: 24
  Latest checkpoint's PrevTimeLineID: 24
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:897
  Latest checkpoint's NextOID: 57399
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 478
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 897
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Fri Feb  4 19:30:09 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: d318b263511266a6d310d7aad8d801e535aabc0f28fab3c069549b09a0b8161b

2022-02-04 19:57:27,977 INFO: doing crash recovery in a single user mode
2022-02-04 19:57:28,135 WARNING: Postgresql is not running.
2022-02-04 19:57:28,135 INFO: Lock owner: None; I am harbor-db-harbor-db-7thp-0
2022-02-04 19:57:28,143 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202007201
  Database system identifier: 7057319451731464255
  Database cluster state: shut down
  pg_control last modified: Fri Feb  4 19:57:28 2022
  Latest checkpoint location: 0/CC000028
  Latest checkpoint's REDO location: 0/CC000028
  Latest checkpoint's REDO WAL file: 0000001800000000000000CC
  Latest checkpoint's TimeLineID: 24
  Latest checkpoint's PrevTimeLineID: 24
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:897
  Latest checkpoint's NextOID: 57399
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 478
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Fri Feb  4 19:57:28 2022
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 100
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: d318b263511266a6d310d7aad8d801e535aabc0f28fab3c069549b09a0b8161b

2022-02-04 19:57:28,157 INFO: Lock owner: None; I am harbor-db-harbor-db-7thp-0
2022-02-04 19:57:28,158 INFO: trying to follow a remote master because standby cluster is unhealthy
2022-02-04 19:57:28.386 UTC [146] LOG:  pgaudit extension initialized
2022-02-04 19:57:28,387 INFO: postmaster pid=146
/tmp/postgres:5432 - no response
2022-02-04 19:57:28.401 UTC [146] LOG:  redirecting log output to logging collector process
2022-02-04 19:57:28.401 UTC [146] HINT:  Future log output will appear in directory "log".

Additional Information

Standby cluster YAML:

Spec:
  Backups:
    Pgbackrest:
      Global:
        repo1-path:                 {repo_path}
        repo1-retention-full:       7
        repo1-retention-full-type:  time
        repo1-s3-key:               {s3_access_key}
        repo1-s3-key-secret:        {s3_secret_key}
        repo1-s3-uri-style:         path
      Jobs:
        Resources:
          Limits:
            Cpu:     250m
            Memory:  128Mi
          Requests:
            Cpu:     250m
            Memory:  128Mi
      Manual:
        Options:
          --type=full
        Repo Name:  repo1
      Repos:
        Name:  repo1
        s3:
          Bucket:    {s3_bucket_name}
          Endpoint:  {s3_endpoint}
          Region:    {s3_region}
        Schedules:
          Full:         0 1 * * *
          Incremental:  */30 * * * *
      Restore:
        Enabled:    false
        Repo Name:  repo1
        Resources:
          Limits:
            Cpu:     500m
            Memory:  256Mi
          Requests:
            Cpu:     500m
            Memory:  256Mi
      Sidecars:
        Pgbackrest:
          Resources:
            Limits:
              Cpu:     250m
              Memory:  128Mi
            Requests:
              Cpu:     250m
              Memory:  128Mi
  Instances:
    Data Volume Claim Spec:
      Access Modes:
        ReadWriteOnce
      Resources:
        Requests:
          Storage:         10Gi
      Storage Class Name:  {sc_name}
    Name:                  harbor-db
    Replicas:              2
    Resources:
      Limits:
        Cpu:     1000m
        Memory:  1Gi
      Requests:
        Cpu:     500m
        Memory:  1Gi
    Sidecars:
      Replica Cert Copy:
        Resources:
          Limits:
            Cpu:     125m
            Memory:  64Mi
          Requests:
            Cpu:     125m
            Memory:  64Mi
  Openshift:         false
  Patroni:
    Dynamic Configuration:
      Postgresql:
        pg_hba:
          host all all 0.0.0.0/0 md5
    Leader Lease Duration Seconds:  30
    Port:                           8008
    Sync Period Seconds:            10
  Port:                             5432
  Postgres Version:                 13
  Proxy:
    Pg Bouncer:
      Port:      5432
      Replicas:  2
      Resources:
        Limits:
          Cpu:     500m
          Memory:  256Mi
        Requests:
          Cpu:     500m
          Memory:  256Mi
      Service:
        Type:  NodePort
      Sidecars:
        Pgbouncer Config:
          Resources:
            Limits:
              Cpu:     100m
              Memory:  64Mi
            Requests:
              Cpu:     100m
              Memory:  64Mi
  Standby:
    Enabled:    true
    Repo Name:  repo1
  Users:
    Databases:
      notary_server
      notary_signer
      registry
    Name:  harbor

It would be valuable to know (and document) what steps need to be taken to ensure a standby cluster, replicated from the primary via S3, is properly synced with the primary after an in-place PITR.

djtaylor commented 2 years ago

I figured this out a few minutes after I posted this. When you create the standby cluster, it needs to have restore mode enabled for it to properly sync with the master:

Spec:
  Backups:
    Pgbackrest:
      Restore:
        Enabled: true
        Repo Name: repo1

When this restore option is set, the standby cluster will properly initialize and become healthy. Afterwards you can patch to disable restore mode.