CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.91k stars 588 forks source link

Unable to upgrade postgres-gis-ha cluster from 4.6.2 to 4.7.0 #2493

Closed mattboyd11 closed 10 months ago

mattboyd11 commented 3 years ago

Overview

I'm trying to upgrade a crunchy-postgres-gis-ha cluster (named pgdb-beta) from 13.2-3.0-4.6.2 to 13.3-3.1-4.7.0 via the pgo upgrade command. Any guidance or advice would be much appreciated!

Environment

Please provide the following details:

Steps to Reproduce

Create cluster pgdb-beta

pgo create cluster -n pgdb pgdb-beta \
--ccp-image="crunchy-postgres-gis-ha" \
--ccp-image-tag="centos8-13.2-3.0-4.6.2" \
--cpu="3.5" \
--debug \
--label="db=pgo" \
--memory="12Gi" \
--metrics \
--pgbackrest-pvc-size="52Gi" \
--pgbouncer \
--pgbouncer-replicas=2 \
--pvc-size="37Gi" \
--replica-count=1 \
--sync-replication \

Upgrade pgdb-beta

$ pgo upgrade --ccp-image-tag=centos8-13.3-3.1-4.7.0 -n pgdb pgdb-beta

EXPECTED

Cluster pgdb-beta redeployed with a primary and replica pod that have 13.3-3.1-4.7.0 image tag.

ACTUAL

Cluster scaled down and then bootstrap job does not complete. It is continuously 'waiting for leader to bootstrap'.

Logs

Bootstrap logs:

nss_wrapper: user exists
nss_wrapper: group exists
nss_wrapper: environment configured
Wed Jun  9 21:35:04 UTC 2021 INFO: postgres-ha pre-bootstrap starting...
Wed Jun  9 21:35:04 UTC 2021 INFO: pgBackRest auto-config disabled
Wed Jun  9 21:35:04 UTC 2021 INFO: PGHA_PGBACKREST_LOCAL_S3_STORAGE, PGHA_PGBACKREST_LOCAL_GCS_STORAGE and PGHA_PGBACKREST_INITIALIZE will be ignored if provided
Wed Jun  9 21:35:04 UTC 2021 INFO: Defaults have been set for the following postgres-ha auto-configuration env vars: PGHA_DEFAULT_CONFIG, PGHA_BASE_BOOTSTRAP_CONFIG, PGHA_BASE_PG_CONFIG
Wed Jun  9 21:35:04 UTC 2021 INFO: Defaults have been set for the following postgres-ha env vars: PGHA_PATRONI_PORT
Wed Jun  9 21:35:04 UTC 2021 INFO: Defaults have been set for the following Patroni env vars: PATRONI_NAME, PATRONI_RESTAPI_LISTEN, PATRONI_RESTAPI_CONNECT_ADDRESS, PATRONI_POSTGRESQL_LISTEN, PATRONI_POSTGRESQL_CONNECT_ADDRESS
Wed Jun  9 21:35:04 UTC 2021 INFO: Setting postgres-ha configuration for database user credentials
Wed Jun  9 21:35:04 UTC 2021 INFO: Setting 'pguser' credentials using file system
Wed Jun  9 21:35:04 UTC 2021 INFO: Setting 'superuser' credentials using file system
Wed Jun  9 21:35:04 UTC 2021 INFO: Setting 'replicator' credentials using file system
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying base bootstrap config to postgres-ha configuration
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying base postgres config to postgres-ha configuration
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying pgbackrest config to postgres-ha configuration
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying synchronous replication settings to postgres-ha configuration
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying standard (non-TLS) remote connection configuration to pg_hba.conf
Wed Jun  9 21:35:04 UTC 2021 INFO: Disabling archiving for bootstrap method pgbackrest_init
Wed Jun  9 21:35:04 UTC 2021 INFO: Custom postgres-ha configuration file not detected
Wed Jun  9 21:35:04 UTC 2021 INFO: Finished building postgres-ha configuration file '/tmp/postgres-ha-bootstrap.yaml'
Wed Jun  9 21:35:04 UTC 2021 INFO: Detected cluster initialization using an existing PGDATA directory
Wed Jun  9 21:35:04 UTC 2021 INFO: postgres-ha pre-bootstrap complete!  The following configuration will be utilized to initialize 
******************************
postgres-ha (PGHA) env vars:
******************************
PGHA_BASE_PG_CONFIG=true
PGHA_PATRONI_PORT=8009
PGHA_PG_PORT=5432
PGHA_PGBACKREST_LOCAL_S3_STORAGE=false
PGHA_SYNC_REPLICATION=true
PGHA_USER=postgres
PGHA_INIT=true
PGHA_DEFAULT_CONFIG=true
PGHA_REPLICA_REINIT_ON_START_FAIL=true
PGHA_PGBACKREST=true
PGHA_BASE_BOOTSTRAP_CONFIG=true
PGHA_STANDBY=false
PGHA_BOOTSTRAP_METHOD=pgbackrest_init
PGHA_TLS_ONLY=false
PGHA_TLS_ENABLED=false
PGHA_DATABASE=pgdb-beta
******************************
Patroni env vars:
******************************
PATRONI_POSTGRESQL_CONNECT_ADDRESS=10.68.6.15:5432
PATRONI_POSTGRESQL_LISTEN=0.0.0.0:5432
PATRONI_NAME=pgdb-beta-bootstrap-98psr
PATRONI_SCOPE=pgdb-beta
PATRONI_RESTAPI_LISTEN=0.0.0.0:8009
PATRONI_POSTGRESQL_DATA_DIR=/pgdata/pgdb-beta
PATRONI_RESTAPI_CONNECT_ADDRESS=10.68.6.15:8009
PATRONI_LOG_LEVEL=INFO
PATRONI_KUBERNETES_LABELS={vendor: "crunchydata"}
PATRONI_KUBERNETES_SCOPE_LABEL=crunchy-pgha-scope
PATRONI_KUBERNETES_NAMESPACE=pgdb
******************************
Patroni bootstrap method: pgbackrest_init
******************************
Patroni configuration file:
******************************
bootstrap:
  method: pgbackrest_init
  pgbackrest_init:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      primary'
    keep_existing_recovery_conf: true
  existing_init:
    command: '/opt/crunchy/bin/postgres-ha/bootstrap/create-from-existing.sh'
    keep_existing_recovery_conf: true
  dcs:
    postgresql:
      parameters:
        jit: off
        unix_socket_directories: /tmp
        wal_level: logical
        archive_mode: on
        archive_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-push "%p"'
        archive_timeout: 60
        log_directory: pg_log
        shared_buffers: 128MB
        temp_buffers: 8MB
        log_min_duration_statement: 60000
        log_statement: none
        work_mem: 4MB
        max_wal_senders: 6
        shared_preload_libraries: pgaudit.so,pg_stat_statements.so,pgnodemx.so
        synchronous_commit: "on"
        synchronous_standby_names: "*"
      use_slots: false
      recovery_conf:
        restore_command: 'source /opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-set-env.sh
          && pgbackrest archive-get %f "%p"'
      use_pg_rewind: true
    synchronous_mode: true
  post_bootstrap: /opt/crunchy/bin/postgres-ha/bootstrap/post-bootstrap.sh
  initdb:
  - encoding: UTF8
  - data-checksums
postgresql:
  use_unix_socket: true
  pgpass: /tmp/.pgpass
  create_replica_methods:
  - pgbackrest
  - basebackup
  pgbackrest:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      replica'
    keep_data: true
    no_params: true
  pgbackrest_standby:
    command: '/opt/crunchy/bin/postgres-ha/pgbackrest/pgbackrest-create-replica.sh
      standby'
    keep_data: true
    no_params: true
    no_master: 1
  remove_data_directory_on_rewind_failure: true
  callbacks:
    on_role_change: /opt/crunchy/bin/postgres-ha/callbacks/pgha-on-role-change.sh
  parameters:
    archive_command: 'false'
  pg_hba:
  - local all postgres peer
  - host replication primaryuser 0.0.0.0/0 md5
  - host all primaryuser 0.0.0.0/0 reject
  - host all all 0.0.0.0/0 md5
Wed Jun  9 21:35:04 UTC 2021 INFO: Applying SSHD..
Wed Jun  9 21:35:04 UTC 2021 INFO: nss_wrapper: ssh configured
Wed Jun  9 21:35:04 UTC 2021 INFO: Checking for SSH Host Keys in /sshd..
Wed Jun  9 21:35:04 UTC 2021 INFO: Checking for authorized_keys in /sshd
Wed Jun  9 21:35:04 UTC 2021 INFO: Checking for sshd_config in /sshd
Wed Jun  9 21:35:04 UTC 2021 INFO: Starting SSHD..
WARNING: 'UsePAM no' is not supported in Fedora and may cause several problems.
Wed Jun  9 21:35:04 UTC 2021 INFO: Starting background process to monitor Patroni initization and restart the database if needed
Wed Jun  9 21:35:04 UTC 2021 INFO: Initializing cluster bootstrap with command: '/usr/local/bin/patroni /tmp/postgres-ha-bootstrap.yaml'
Wed Jun  9 21:35:04 UTC 2021 INFO: Patroni will not run as PID 1. Creating signal handler
2021-06-09 21:35:04,575 INFO: No PostgreSQL configuration items changed, nothing to reload.
2021-06-09 21:35:04,579 INFO: Lock owner: None; I am pgdb-beta-bootstrap-98psr
2021-06-09 21:35:04,601 INFO: waiting for leader to bootstrap
2021-06-09 21:35:15,079 INFO: Lock owner: None; I am pgdb-beta-bootstrap-98psr
2021-06-09 21:35:15,080 INFO: waiting for leader to bootstrap
2021-06-09 21:35:25,079 INFO: Lock owner: None; I am pgdb-beta-bootstrap-98psr
2021-06-09 21:35:25,079 INFO: waiting for leader to bootstrap

Operator logs:

2021/06/09 21:34:15 INF  132 (localhost:4150) connecting to nsqd
time="2021-06-09T21:34:15Z" level=info msg="deleting Pgreplica object in namespace pgdb" func="internal/operator/cluster.DeleteReplica()" file="internal/operator/cluster/clusterlogic.go:559" version=4.7.0
time="2021-06-09T21:34:15Z" level=info msg="deleting with Name=pgdb-beta-hqhu in namespace pgdb" func="internal/operator/cluster.DeleteReplica()" file="internal/operator/cluster/clusterlogic.go:560" version=4.7.0
2021/06/09 21:34:15 INF  133 (localhost:4150) connecting to nsqd
time="2021-06-09T21:34:15Z" level=info msg="deleting Pgreplica object in namespace pgdb" func="internal/operator/cluster.DeleteReplica()" file="internal/operator/cluster/clusterlogic.go:559" version=4.7.0
time="2021-06-09T21:34:15Z" level=info msg="deleting with Name=pgdb-beta-hqhu in namespace pgdb" func="internal/operator/cluster.DeleteReplica()" file="internal/operator/cluster/clusterlogic.go:560" version=4.7.0
2021/06/09 21:34:15 INF  134 (localhost:4150) connecting to nsqd
time="2021-06-09T21:34:45Z" level=error msg="ConfigMap Controller: cannot find pgcluster for configMap pgdb-beta-pgha-config (namespace pgdb),ignoring" func="internal/controller/configmap.(*Controller).handleConfigMapSync()" file="internal/controller/configmap/synchandler.go:55" version=4.7.0
2021/06/09 21:34:46 INF  135 (localhost:4150) connecting to nsqd
time="2021-06-09T21:34:46Z" level=info msg="found existing pgha ConfigMap for cluster pgdb-beta, setting init flag to 'true'" func="internal/operator/cluster.AddClusterBootstrap()" file="internal/operator/cluster/cluster.go:322" version=4.7.0
time="2021-06-09T21:34:46Z" level=info msg="creating Pgcluster pgdb-beta in namespace pgdb" func="internal/operator/cluster.getClusterDeploymentFields()" file="internal/operator/cluster/clusterlogic.go:266" version=4.7.0
time="2021-06-09T21:34:46Z" level=info msg="exporter secret pgdb-beta-exporter-secret already present, will reuse" func="internal/operator/cluster.CreateExporterSecret()" file="internal/operator/cluster/exporter.go:145" version=4.7.0

Additional Information

Cluster before upgrade:

$ pgo show cluster -n pgdb pgdb-beta

cluster : pgdb-beta (crunchy-postgres-gis-ha:centos8-13.2-3.0-4.6.2)
        pod : pgdb-beta-bf75f8c87-dt5s4 (Running) on gke-c-jx6kj-db-primary-e19b23fe-nfn3 (2/2) (primary)
                pvc: pgdb-beta (37Gi)
        pod : pgdb-beta-hqhu-64cc6df49d-jsm22 (Running) on gke-c-jx6kj-db-primary-e19b23fe-3d77 (2/2) (replica)
                pvc: pgdb-beta-hqhu (37Gi)
        resources : CPU: 3500m Memory: 12Gi
        deployment : pgdb-beta
        deployment : pgdb-beta-backrest-shared-repo
        deployment : pgdb-beta-hqhu
        deployment : pgdb-beta-pgbouncer
        service : pgdb-beta - ClusterIP (10.67.132.252) - Ports (9187/TCP, 2022/TCP, 5432/TCP)
        service : pgdb-beta-pgbouncer - ClusterIP (10.67.143.142) - Ports (5432/TCP)
        service : pgdb-beta-replica - ClusterIP (10.67.142.231) - Ports (9187/TCP, 2022/TCP, 5432/TCP)
        pgreplica : pgdb-beta-hqhu
        labels : pgouser=admin workflowid=85d7e1d9-78fb-4531-90db-b0d8d726ad30 db=pgo name=pgdb-beta pg-cluster=pgdb-beta pgo-version=4.7.0

Cluster after attempting upgrade:

$ pgo show cluster -n pgdb pgdb-beta

cluster : pgdb-beta (crunchy-postgres-gis-ha:centos8-13.3-3.1-4.7.0)
        resources : CPU: 3500m Memory: 12Gi
        service : pgdb-beta - ClusterIP (10.67.132.252) - Ports (9187/TCP, 2022/TCP, 5432/TCP)
        service : pgdb-beta-pgbouncer - ClusterIP (10.67.143.142) - Ports (5432/TCP)
        service : pgdb-beta-replica - ClusterIP (10.67.142.231) - Ports (9187/TCP, 2022/TCP, 5432/TCP)
        labels : deployment-name=pgdb-beta name=pgdb-beta pg-cluster=pgdb-beta pgo-version=4.7.0 pgouser=admin workflowid=85d7e1d9-78fb-4531-90db-b0d8d726ad30 db=pgo
jkatz commented 3 years ago

The logs above do not show any issues. However, if the issue is installing the updated PostGIS, you need to follow the PostGIS upgrade procedure.

When doing PostGIS updates, these typically follow a "stepping stone" process:

  1. Update your PostgreSQL version
  2. Update your PostGIS version

So first go to 13.2-3.0-4.6.2 => 13.3-3.0-4.7.0. Then go to 13.3-3.0-4.7.0 => 13.3-3.1-4.7.0.

You will need to update the extension as describe in the PostGIS docs.

mattboyd11 commented 3 years ago

Thanks for the quick response!

I've recreated the original 13.2-3.0-4.6.2 cluster and run pgo upgrade --ccp-image-tag=centos8-13.3-3.0-4.7.0 -n pgdb pgdb-beta but I still run into a similar issue where the bootstrap job is continuously "waiting for leader to bootstrap" and no other pgdb-beta pods have redeployed.

Have I misunderstood something?

benjaminjb commented 10 months ago

Thank you for bringing this issue to our attention.

The PGO v4 line you are using is no longer available through the Crunchy Developer Program (for additional details about available versions of Crunchy Postgres for Kubernetes, please see the Supported Platforms page.

If you still require assistance with v4, please contact info@crunchydata.com to see if we can help further.