CrunchyData / postgres-operator

Production PostgreSQL for Kubernetes, from high availability Postgres clusters to full-scale database-as-a-service.
https://access.crunchydata.com/documentation/postgres-operator/v5/
Apache License 2.0
3.93k stars 592 forks source link

Replicas not able to connect to leader #3175

Closed bbroniewski closed 1 year ago

bbroniewski commented 2 years ago

Overview

I started upgrade to pgo version 5.1 from 5.05, it is in progress, at least update pod exists. Replicas pods are not able to communicate with leader pod. Patroni is throwing exception:

2022-04-28 13:44:26,426 ERROR: Exception when working with leader Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 60, in check_leader_is_not_in_recovery with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', conn_kwargs) as cur: File "/usr/lib64/python3.6/contextlib.py", line 81, in enter return next(self.gen) File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 44, in get_connection_cursor conn = psycopg.connect(kwargs) File "/usr/lib64/python3.6/site-packages/psycopg2/init.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: connection to server at "hippo-instance1-dvpn-0.hippo-pods" (10.244.1.195), port 5432 failed: FATAL: certificate authentication failed for user "_crunchyrepl"

I checked certificates itself, they look good, are generated by cert manager using local CA. I tried different common names and dns names, maybe somewhere there issue lays.

Environment

Please provide the following details:

Steps to Reproduce

REPRO

Provide steps to get to the error condition:

  1. Start upgrade from 5.05 to 5.1
  2. Change images to the ones from version 5.1 in postgrescluster CRD.
  3. Observe relicas can not communicate with leader.

EXPECTED

  1. Replicas are able to communicate with leader, cluster is healthy

ACTUAL

  1. Replicas can not communicate with leader, cluster is unhealthy, only leader works

Logs

database container error from log:

2022-04-28 13:44:26,426 ERROR: Exception when working with leader Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/rewind.py", line 60, in check_leader_is_not_in_recovery with get_connection_cursor(connect_timeout=3, options='-c statement_timeout=2000', conn_kwargs) as cur: File "/usr/lib64/python3.6/contextlib.py", line 81, in enter return next(self.gen) File "/usr/local/lib/python3.6/site-packages/patroni/postgresql/connection.py", line 44, in get_connection_cursor conn = psycopg.connect(kwargs) File "/usr/lib64/python3.6/site-packages/psycopg2/init.py", line 127, in connect conn = _connect(dsn, connection_factory=connection_factory, **kwasync) psycopg2.OperationalError: connection to server at "hippo-instance1-dvpn-0.hippo-pods" (10.244.1.195), port 5432 failed: FATAL: certificate authentication failed for user "_crunchyrepl"

Pods from cluster namespace:

NAME READY STATUS RESTARTS AGE hippo-instance1-7sr7-0 3/4 Running 0 35m hippo-instance1-bx96-0 3/4 Running 0 35m hippo-instance1-dvpn-0 4/4 Running 0 2d hippo-pgbackrest-repo1-full-27517150--1-4xrx9 0/1 Completed 0 34h hippo-pgbackrest-repo1-full-27518590--1-5fmkz 0/1 Completed 0 10h hippo-pgbackrest-repo2-full-27517140--1-5rhzh 0/1 Error 0 34h hippo-pgbackrest-repo2-full-27517140--1-k9phj 0/1 Error 0 34h hippo-pgbackrest-repo2-full-27517140--1-pkdjg 0/1 Completed 0 34h hippo-pgbackrest-repo2-full-27518580--1-2hv25 0/1 Error 0 10h hippo-pgbackrest-repo2-full-27518580--1-c6qff 0/1 Completed 0 10h hippo-pgbackrest-repo2-full-27518580--1-jmswl 0/1 Error 0 10h hippo-pgbackrest-repo2-full-27518580--1-z7vgf 0/1 Error 0 10h hippo-pgbackrest-repo2-incr-27519060--1-fhrm5 0/1 Completed 0 174m hippo-pgbackrest-repo2-incr-27519120--1-t9jn5 0/1 Completed 0 114m hippo-pgbackrest-repo2-incr-27519180--1-blsnv 0/1 Completed 0 54m hippo-repo-host-0 2/2 Running 0 2d1h pgo-747d898c67-c2hcr 1/1 Running 0 2d1h pgo-upgrade-68b4797d7f-k8ppq 1/1 Running 0 2d1h

Leader describe pod:

Name: hippo-instance1-dvpn-0 Namespace: pgo Priority: 0 Node: Start Time: Tue, 26 Apr 2022 13:06:31 +0000 Labels: controller-revision-hash=hippo-instance1-dvpn-644db85f8b postgres-operator.crunchydata.com/cluster=hippo postgres-operator.crunchydata.com/data=postgres postgres-operator.crunchydata.com/instance=hippo-instance1-dvpn postgres-operator.crunchydata.com/instance-set=instance1 postgres-operator.crunchydata.com/patroni=hippo-ha postgres-operator.crunchydata.com/role=master statefulset.kubernetes.io/pod-name=hippo-instance1-dvpn-0 Annotations: status: {"conn_url":"postgres://hippo-instance1-dvpn-0.hippo-pods:5432/postgres","api_url":"https://hippo-instance1-dvpn-0.hippo-pods:8008/patroni... Status: Running IP: 10.244.1.195 IPs: IP: 10.244.1.195 Controlled By: StatefulSet/hippo-instance1-dvpn Init Containers: postgres-startup: Container ID: docker://ab82feb173cb11403b11b2fd6b8286bdcb6295c4c207c8cc1da9ba6d49077278 Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -ceu

  declare -r expected_major_version="$1" pgwal_directory="$2" pgbrLog_directory="$3"
  results() { printf '::postgres-operator: %s::%s\n' "$@"; }
  safelink() (
    local desired="$1" name="$2" current
    current=$(realpath "${name}")
    if [ "${current}" = "${desired}" ]; then return; fi
    set -x; mv --no-target-directory "${current}" "${desired}"
    ln --no-dereference --force --symbolic "${desired}" "${name}"
  )
  echo Initializing ...
  results 'uid' "$(id -u)" 'gid' "$(id -G)"
  results 'postgres path' "$(command -v postgres)"
  results 'postgres version' "${postgres_version:=$(postgres --version)}"
  [[ "${postgres_version}" == *") ${expected_major_version}."* ]]
  results 'config directory' "${PGDATA:?}"
  postgres_data_directory=$([ -d "${PGDATA}" ] && postgres -C data_directory || echo "${PGDATA}")
  results 'data directory' "${postgres_data_directory}"
  [ "${postgres_data_directory}" = "${PGDATA}" ]
  bootstrap_dir="${postgres_data_directory}_bootstrap"
  [ -d "${bootstrap_dir}" ] && results 'bootstrap directory' "${bootstrap_dir}"
  [ -d "${bootstrap_dir}" ] && postgres_data_directory="${bootstrap_dir}"
  install --directory --mode=0700 "${postgres_data_directory}"
  results 'pgBackRest log directory' "${pgbrLog_directory}"
  install --directory --mode=0775 "${pgbrLog_directory}"
  install -D --mode=0600 -t "/tmp/replication" "/pgconf/tls/replication"/{tls.crt,tls.key,ca.crt}
  [ -f "${postgres_data_directory}/PG_VERSION" ] || exit 0
  results 'data version' "${postgres_data_version:=$(< "${postgres_data_directory}/PG_VERSION")}"
  [ "${postgres_data_version}" = "${expected_major_version}" ]
  safelink "${pgwal_directory}" "${postgres_data_directory}/pg_wal"
  results 'wal directory' "$(realpath "${postgres_data_directory}/pg_wal")"
  rm -f "${postgres_data_directory}/recovery.signal"
  startup
  14
  /pgdata/pg14_wal
  /pgdata/pgbackrest/log
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Tue, 26 Apr 2022 13:07:39 +0000
  Finished:     Tue, 26 Apr 2022 13:07:39 +0000
Ready:          True
Restart Count:  0
Environment:
  PGDATA:         /pgdata/pg14
  PGHOST:         /tmp/postgres
  PGPORT:         5432
  KRB5_CONFIG:    /etc/postgres/krb5.conf
  KRB5RCACHEDIR:  /tmp
Mounts:
  /pgconf/tls from cert-volume (ro)
  /pgdata from postgres-data (rw)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro)

nss-wrapper-init: Container ID: docker://b5b12c6980d5b8a7d77f980e1a8484f398e2342829075d30eb716b60e3abf780 Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -c export NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"

Define nss_wrapper directory and passwd & group files that will be utilized by nss_wrapper. The

  # nss_wrapper_env.sh script (which also sets these vars) isn't sourced here since the nss_wrapper
  # has not yet been setup, and we therefore don't yet want the nss_wrapper vars in the environment.
  mkdir -p /tmp/nss_wrapper
  chmod g+rwx /tmp/nss_wrapper

  NSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"
  NSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"
  NSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"

  # create the nss_wrapper directory
  mkdir -p "${NSS_WRAPPER_DIR}"

  # grab the current user ID and group ID
  USER_ID=$(id -u)
  export USER_ID
  GROUP_ID=$(id -g)
  export GROUP_ID

  # get copies of the passwd and group files
  [[ -f "${NSS_WRAPPER_PASSWD}" ]] || cp "/etc/passwd" "${NSS_WRAPPER_PASSWD}"
  [[ -f "${NSS_WRAPPER_GROUP}" ]] || cp "/etc/group" "${NSS_WRAPPER_GROUP}"

  # if the username is missing from the passwd file, then add it
  if [[ ! $(cat "${NSS_WRAPPER_PASSWD}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then
      echo "nss_wrapper: adding user"
      passwd_tmp="${NSS_WRAPPER_DIR}/passwd_tmp"
      cp "${NSS_WRAPPER_PASSWD}" "${passwd_tmp}"
      sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${passwd_tmp}"
      # needed for OCP 4.x because crio updates /etc/passwd with an entry for USER_ID
      sed -i "/${USER_ID}:x:/d" "${passwd_tmp}"
      printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${GROUP_ID}:${CRUNCHY_NSS_USER_DESC}:${HOME}:/bin/bash\n' >> "${passwd_tmp}"
      envsubst < "${passwd_tmp}" > "${NSS_WRAPPER_PASSWD}"
      rm "${passwd_tmp}"
  else
      echo "nss_wrapper: user exists"
  fi

  # if the username (which will be the same as the group name) is missing from group file, then add it
  if [[ ! $(cat "${NSS_WRAPPER_GROUP}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then
      echo "nss_wrapper: adding group"
      group_tmp="${NSS_WRAPPER_DIR}/group_tmp"
      cp "${NSS_WRAPPER_GROUP}" "${group_tmp}"
      sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${group_tmp}"
      printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${CRUNCHY_NSS_USERNAME}\n' >> "${group_tmp}"
      envsubst < "${group_tmp}" > "${NSS_WRAPPER_GROUP}"
      rm "${group_tmp}"
  else
      echo "nss_wrapper: group exists"
  fi

  # export the nss_wrapper env vars
  # define nss_wrapper directory and passwd & group files that will be utilized by nss_wrapper
  NSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"
  NSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"
  NSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"

  export LD_PRELOAD=/usr/lib64/libnss_wrapper.so
  export NSS_WRAPPER_PASSWD="${NSS_WRAPPER_PASSWD}"
  export NSS_WRAPPER_GROUP="${NSS_WRAPPER_GROUP}"

  echo "nss_wrapper: environment configured"

State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Tue, 26 Apr 2022 13:07:39 +0000
  Finished:     Tue, 26 Apr 2022 13:07:39 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro)

Containers: database: Container ID: docker://0a93bf1dd4ba135aba8161ac8d5621df741ce8ea279f1b022c6971e2d31f4cce Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: 5432/TCP Host Port: 0/TCP Command: patroni /etc/patroni State: Running Started: Tue, 26 Apr 2022 13:07:40 +0000 Ready: True Restart Count: 0 Liveness: http-get https://:8008/liveness delay=3s timeout=5s period=10s #success=1 #failure=3 Readiness: http-get https://:8008/readiness delay=3s timeout=5s period=10s #success=1 #failure=3 Environment: PGDATA: /pgdata/pg14 PGHOST: /tmp/postgres PGPORT: 5432 KRB5_CONFIG: /etc/postgres/krb5.conf KRB5RCACHEDIR: /tmp PATRONI_NAME: hippo-instance1-dvpn-0 (v1:metadata.name) PATRONI_KUBERNETES_POD_IP: (v1:status.podIP) PATRONI_KUBERNETES_PORTS: - name: postgres port: 5432 protocol: TCP

  PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(PATRONI_NAME).hippo-pods:5432
  PATRONI_POSTGRESQL_LISTEN:           *:5432
  PATRONI_POSTGRESQL_CONFIG_DIR:       /pgdata/pg14
  PATRONI_POSTGRESQL_DATA_DIR:         /pgdata/pg14
  PATRONI_RESTAPI_CONNECT_ADDRESS:     $(PATRONI_NAME).hippo-pods:8008
  PATRONI_RESTAPI_LISTEN:              *:8008
  PATRONICTL_CONFIG_FILE:              /etc/patroni
  LD_PRELOAD:                          /usr/lib64/libnss_wrapper.so
  NSS_WRAPPER_PASSWD:                  /tmp/nss_wrapper/postgres/passwd
  NSS_WRAPPER_GROUP:                   /tmp/nss_wrapper/postgres/group
Mounts:
  /dev/shm from dshm (rw)
  /etc/database-containerinfo from database-containerinfo (ro)
  /etc/patroni from patroni-config (ro)
  /etc/pgbackrest/conf.d from pgbackrest-config (ro)
  /pgconf/tls from cert-volume (ro)
  /pgdata from postgres-data (rw)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro)

replication-cert-copy: Container ID: docker://5c62bbd1f3f5d4deee388d79db3d56c91aab772c3e237ba3a9b278152e84d81d Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -ceu

  monitor() {
  declare -r directory="/pgconf/tls"
  exec {fd}<> <(:)
  while read -r -t 5 -u "${fd}" || true; do
    if [ "${directory}" -nt "/proc/self/fd/${fd}" ] &&
      install -D --mode=0600 -t "/tmp/replication" "${directory}"/{replication/tls.crt,replication/tls.key,replication/ca.crt} &&
      pkill -HUP --exact --parent=1 postgres
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --format='Loaded certificates dated %y' "${directory}"
    fi
  done
  }; export -f monitor; exec -a "$0" bash -ceu monitor
  replication-cert-copy
State:          Running
  Started:      Tue, 26 Apr 2022 13:07:41 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /pgconf/tls from cert-volume (ro)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro)

pgbackrest: Container ID: docker://5fb1f2e2497488cfd3985767112e7e4bce814be43075851a79338ac7edd7d85d Image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:93a597704eca04d4529f9db8c593e11ca74ca4d38a223d22a8837357fef3ab3d Port: Host Port: Command: pgbackrest server State: Running Started: Tue, 26 Apr 2022 13:07:42 +0000 Ready: True Restart Count: 0 Liveness: exec [pgbackrest server-ping] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: LD_PRELOAD: /usr/lib64/libnss_wrapper.so NSS_WRAPPER_PASSWD: /tmp/nss_wrapper/postgres/passwd NSS_WRAPPER_GROUP: /tmp/nss_wrapper/postgres/group Mounts: /etc/pgbackrest/conf.d from pgbackrest-config (ro) /etc/pgbackrest/server from pgbackrest-server (ro) /pgdata from postgres-data (rw) /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro) pgbackrest-config: Container ID: docker://0177ae3f10e2015fb4ac36579767dbae7108b8cc0fa1713b20b311d943777791 Image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0 Image ID: docker-pullable://registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:93a597704eca04d4529f9db8c593e11ca74ca4d38a223d22a8837357fef3ab3d Port: Host Port: Command: bash -ceu

  monitor() {
  exec {fd}<> <(:)
  until read -r -t 5 -u "${fd}"; do
    if
      [ "${filename}" -nt "/proc/self/fd/${fd}" ] &&
      pkill -HUP --exact --parent=0 pgbackrest
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --dereference --format='Loaded configuration dated %y' "${filename}"
    elif
      { [ "${directory}" -nt "/proc/self/fd/${fd}" ] ||
        [ "${authority}" -nt "/proc/self/fd/${fd}" ]
      } &&
      pkill -HUP --exact --parent=0 pgbackrest
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --format='Loaded certificates dated %y' "${directory}"
    fi
  done
  }; export directory="$1" authority="$2" filename="$3"; export -f monitor; exec -a "$0" bash -ceu monitor
  pgbackrest-config
  /etc/pgbackrest/server
  /etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt
  /etc/pgbackrest/conf.d/~postgres-operator_server.conf
State:          Running
  Started:      Tue, 26 Apr 2022 13:07:42 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /etc/pgbackrest/conf.d from pgbackrest-config (ro)
  /etc/pgbackrest/server from pgbackrest-server (ro)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-j5vd9 (ro)

Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: cert-volume: Type: Projected (a volume that contains injected data from multiple sources) SecretName: hippo-tls SecretOptionalName: SecretName: hippo-replication-tls SecretOptionalName: postgres-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: hippo-instance1-dvpn-pgdata ReadOnly: false database-containerinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: limits.cpu -> cpu_limit requests.cpu -> cpu_request limits.memory -> mem_limit requests.memory -> mem_request metadata.labels -> labels metadata.annotations -> annotations pgbackrest-server: Type: Projected (a volume that contains injected data from multiple sources) SecretName: hippo-instance1-dvpn-certs SecretOptionalName: pgbackrest-config: Type: Projected (a volume that contains injected data from multiple sources) SecretName: pgo-s3-creds SecretOptionalName: ConfigMapName: hippo-pgbackrest-config ConfigMapOptional: SecretName: hippo-pgbackrest SecretOptionalName: 0xc000695d70 patroni-config: Type: Projected (a volume that contains injected data from multiple sources) ConfigMapName: hippo-config ConfigMapOptional: ConfigMapName: hippo-instance1-dvpn-config ConfigMapOptional: SecretName: hippo-instance1-dvpn-certs SecretOptionalName: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: 16Mi dshm: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: kube-api-access-j5vd9: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events:

One of failing Replicas describe:

Name: hippo-instance1-7sr7-0 Namespace: pgo Priority: 0 Node: combartekbroniewski-slot4/192.168.1.43 Start Time: Thu, 28 Apr 2022 13:19:02 +0000 Labels: controller-revision-hash=hippo-instance1-7sr7-868ccc8d54 postgres-operator.crunchydata.com/cluster=hippo postgres-operator.crunchydata.com/data=postgres postgres-operator.crunchydata.com/instance=hippo-instance1-7sr7 postgres-operator.crunchydata.com/instance-set=instance1 postgres-operator.crunchydata.com/patroni=hippo-ha statefulset.kubernetes.io/pod-name=hippo-instance1-7sr7-0 Annotations: status: {"conn_url":"postgres://hippo-instance1-7sr7-0.hippo-pods:5432/postgres","api_url":"https://hippo-instance1-7sr7-0.hippo-pods:8008/patroni... Status: Running IP: 10.244.2.67 IPs: IP: 10.244.2.67 Controlled By: StatefulSet/hippo-instance1-7sr7 Init Containers: postgres-startup: Container ID: containerd://e9e47f98b12ff9e5aefcf96224fb84895a76feb000843d73daf31df6e2b9a0ac Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -ceu

  declare -r expected_major_version="$1" pgwal_directory="$2" pgbrLog_directory="$3"
  results() { printf '::postgres-operator: %s::%s\n' "$@"; }
  safelink() (
    local desired="$1" name="$2" current
    current=$(realpath "${name}")
    if [ "${current}" = "${desired}" ]; then return; fi
    set -x; mv --no-target-directory "${current}" "${desired}"
    ln --no-dereference --force --symbolic "${desired}" "${name}"
  )
  echo Initializing ...
  results 'uid' "$(id -u)" 'gid' "$(id -G)"
  results 'postgres path' "$(command -v postgres)"
  results 'postgres version' "${postgres_version:=$(postgres --version)}"
  [[ "${postgres_version}" == *") ${expected_major_version}."* ]]
  results 'config directory' "${PGDATA:?}"
  postgres_data_directory=$([ -d "${PGDATA}" ] && postgres -C data_directory || echo "${PGDATA}")
  results 'data directory' "${postgres_data_directory}"
  [ "${postgres_data_directory}" = "${PGDATA}" ]
  bootstrap_dir="${postgres_data_directory}_bootstrap"
  [ -d "${bootstrap_dir}" ] && results 'bootstrap directory' "${bootstrap_dir}"
  [ -d "${bootstrap_dir}" ] && postgres_data_directory="${bootstrap_dir}"
  install --directory --mode=0700 "${postgres_data_directory}"
  results 'pgBackRest log directory' "${pgbrLog_directory}"
  install --directory --mode=0775 "${pgbrLog_directory}"
  install -D --mode=0600 -t "/tmp/replication" "/pgconf/tls/replication"/{tls.crt,tls.key,ca.crt}
  [ -f "${postgres_data_directory}/PG_VERSION" ] || exit 0
  results 'data version' "${postgres_data_version:=$(< "${postgres_data_directory}/PG_VERSION")}"
  [ "${postgres_data_version}" = "${expected_major_version}" ]
  safelink "${pgwal_directory}" "${postgres_data_directory}/pg_wal"
  results 'wal directory' "$(realpath "${postgres_data_directory}/pg_wal")"
  rm -f "${postgres_data_directory}/recovery.signal"
  startup
  14
  /pgdata/pg14_wal
  /pgdata/pgbackrest/log
State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 28 Apr 2022 13:19:13 +0000
  Finished:     Thu, 28 Apr 2022 13:19:13 +0000
Ready:          True
Restart Count:  0
Environment:
  PGDATA:         /pgdata/pg14
  PGHOST:         /tmp/postgres
  PGPORT:         5432
  KRB5_CONFIG:    /etc/postgres/krb5.conf
  KRB5RCACHEDIR:  /tmp
Mounts:
  /pgconf/tls from cert-volume (ro)
  /pgdata from postgres-data (rw)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro)

nss-wrapper-init: Container ID: containerd://d0ba0e36c7c2ec49bd853d1e91fb40febce4815f17836324986e51b528691768 Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -c export NSS_WRAPPER_SUBDIR=postgres CRUNCHY_NSS_USERNAME=postgres CRUNCHY_NSS_USER_DESC="postgres"

Define nss_wrapper directory and passwd & group files that will be utilized by nss_wrapper. The

  # nss_wrapper_env.sh script (which also sets these vars) isn't sourced here since the nss_wrapper
  # has not yet been setup, and we therefore don't yet want the nss_wrapper vars in the environment.
  mkdir -p /tmp/nss_wrapper
  chmod g+rwx /tmp/nss_wrapper

  NSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"
  NSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"
  NSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"

  # create the nss_wrapper directory
  mkdir -p "${NSS_WRAPPER_DIR}"

  # grab the current user ID and group ID
  USER_ID=$(id -u)
  export USER_ID
  GROUP_ID=$(id -g)
  export GROUP_ID

  # get copies of the passwd and group files
  [[ -f "${NSS_WRAPPER_PASSWD}" ]] || cp "/etc/passwd" "${NSS_WRAPPER_PASSWD}"
  [[ -f "${NSS_WRAPPER_GROUP}" ]] || cp "/etc/group" "${NSS_WRAPPER_GROUP}"

  # if the username is missing from the passwd file, then add it
  if [[ ! $(cat "${NSS_WRAPPER_PASSWD}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then
      echo "nss_wrapper: adding user"
      passwd_tmp="${NSS_WRAPPER_DIR}/passwd_tmp"
      cp "${NSS_WRAPPER_PASSWD}" "${passwd_tmp}"
      sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${passwd_tmp}"
      # needed for OCP 4.x because crio updates /etc/passwd with an entry for USER_ID
      sed -i "/${USER_ID}:x:/d" "${passwd_tmp}"
      printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${GROUP_ID}:${CRUNCHY_NSS_USER_DESC}:${HOME}:/bin/bash\n' >> "${passwd_tmp}"
      envsubst < "${passwd_tmp}" > "${NSS_WRAPPER_PASSWD}"
      rm "${passwd_tmp}"
  else
      echo "nss_wrapper: user exists"
  fi

  # if the username (which will be the same as the group name) is missing from group file, then add it
  if [[ ! $(cat "${NSS_WRAPPER_GROUP}") =~ ${CRUNCHY_NSS_USERNAME}:x:${USER_ID} ]]; then
      echo "nss_wrapper: adding group"
      group_tmp="${NSS_WRAPPER_DIR}/group_tmp"
      cp "${NSS_WRAPPER_GROUP}" "${group_tmp}"
      sed -i "/${CRUNCHY_NSS_USERNAME}:x:/d" "${group_tmp}"
      printf '${CRUNCHY_NSS_USERNAME}:x:${USER_ID}:${CRUNCHY_NSS_USERNAME}\n' >> "${group_tmp}"
      envsubst < "${group_tmp}" > "${NSS_WRAPPER_GROUP}"
      rm "${group_tmp}"
  else
      echo "nss_wrapper: group exists"
  fi

  # export the nss_wrapper env vars
  # define nss_wrapper directory and passwd & group files that will be utilized by nss_wrapper
  NSS_WRAPPER_DIR="/tmp/nss_wrapper/${NSS_WRAPPER_SUBDIR}"
  NSS_WRAPPER_PASSWD="${NSS_WRAPPER_DIR}/passwd"
  NSS_WRAPPER_GROUP="${NSS_WRAPPER_DIR}/group"

  export LD_PRELOAD=/usr/lib64/libnss_wrapper.so
  export NSS_WRAPPER_PASSWD="${NSS_WRAPPER_PASSWD}"
  export NSS_WRAPPER_GROUP="${NSS_WRAPPER_GROUP}"

  echo "nss_wrapper: environment configured"

State:          Terminated
  Reason:       Completed
  Exit Code:    0
  Started:      Thu, 28 Apr 2022 13:19:14 +0000
  Finished:     Thu, 28 Apr 2022 13:19:14 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro)

Containers: database: Container ID: containerd://59773b768a733beff4f48bde8e48cbed9fc76389d655362c5a0a11d2c1f054cc Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: 5432/TCP Host Port: 0/TCP Command: patroni /etc/patroni State: Running Started: Thu, 28 Apr 2022 13:19:15 +0000 Ready: False Restart Count: 0 Liveness: http-get https://:8008/liveness delay=3s timeout=5s period=10s #success=1 #failure=3 Readiness: http-get https://:8008/readiness delay=3s timeout=5s period=10s #success=1 #failure=3 Environment: PGDATA: /pgdata/pg14 PGHOST: /tmp/postgres PGPORT: 5432 KRB5_CONFIG: /etc/postgres/krb5.conf KRB5RCACHEDIR: /tmp PATRONI_NAME: hippo-instance1-7sr7-0 (v1:metadata.name) PATRONI_KUBERNETES_POD_IP: (v1:status.podIP) PATRONI_KUBERNETES_PORTS: - name: postgres port: 5432 protocol: TCP

  PATRONI_POSTGRESQL_CONNECT_ADDRESS:  $(PATRONI_NAME).hippo-pods:5432
  PATRONI_POSTGRESQL_LISTEN:           *:5432
  PATRONI_POSTGRESQL_CONFIG_DIR:       /pgdata/pg14
  PATRONI_POSTGRESQL_DATA_DIR:         /pgdata/pg14
  PATRONI_RESTAPI_CONNECT_ADDRESS:     $(PATRONI_NAME).hippo-pods:8008
  PATRONI_RESTAPI_LISTEN:              *:8008
  PATRONICTL_CONFIG_FILE:              /etc/patroni
  LD_PRELOAD:                          /usr/lib64/libnss_wrapper.so
  NSS_WRAPPER_PASSWD:                  /tmp/nss_wrapper/postgres/passwd
  NSS_WRAPPER_GROUP:                   /tmp/nss_wrapper/postgres/group
Mounts:
  /dev/shm from dshm (rw)
  /etc/database-containerinfo from database-containerinfo (ro)
  /etc/patroni from patroni-config (ro)
  /etc/pgbackrest/conf.d from pgbackrest-config (ro)
  /pgconf/tls from cert-volume (ro)
  /pgdata from postgres-data (rw)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro)

replication-cert-copy: Container ID: containerd://5c11252359b2c73c79ac78eaf9e308293881043c780127d2ccc6c753edfb955c Image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-postgres@sha256:9d71b968a08e6b189051d4e8d64e2ea2c118ac60e4b7b301478bb0b4942de7af Port: Host Port: Command: bash -ceu

  monitor() {
  declare -r directory="/pgconf/tls"
  exec {fd}<> <(:)
  while read -r -t 5 -u "${fd}" || true; do
    if [ "${directory}" -nt "/proc/self/fd/${fd}" ] &&
      install -D --mode=0600 -t "/tmp/replication" "${directory}"/{replication/tls.crt,replication/tls.key,replication/ca.crt} &&
      pkill -HUP --exact --parent=1 postgres
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --format='Loaded certificates dated %y' "${directory}"
    fi
  done
  }; export -f monitor; exec -a "$0" bash -ceu monitor
  replication-cert-copy
State:          Running
  Started:      Thu, 28 Apr 2022 13:19:16 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /pgconf/tls from cert-volume (ro)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro)

pgbackrest: Container ID: containerd://7ca294c4c775fd9a2f4382a64bd6e8ae64cca981dceb28a30ee7b5e800bc97c9 Image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:93a597704eca04d4529f9db8c593e11ca74ca4d38a223d22a8837357fef3ab3d Port: Host Port: Command: pgbackrest server State: Running Started: Thu, 28 Apr 2022 13:19:16 +0000 Ready: True Restart Count: 0 Liveness: exec [pgbackrest server-ping] delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: LD_PRELOAD: /usr/lib64/libnss_wrapper.so NSS_WRAPPER_PASSWD: /tmp/nss_wrapper/postgres/passwd NSS_WRAPPER_GROUP: /tmp/nss_wrapper/postgres/group Mounts: /etc/pgbackrest/conf.d from pgbackrest-config (ro) /etc/pgbackrest/server from pgbackrest-server (ro) /pgdata from postgres-data (rw) /tmp from tmp (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro) pgbackrest-config: Container ID: containerd://d8f34d1ea17c9b02e557853e3ec08015d1151a224a3dd262596f523d4a604159 Image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0 Image ID: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest@sha256:93a597704eca04d4529f9db8c593e11ca74ca4d38a223d22a8837357fef3ab3d Port: Host Port: Command: bash -ceu

  monitor() {
  exec {fd}<> <(:)
  until read -r -t 5 -u "${fd}"; do
    if
      [ "${filename}" -nt "/proc/self/fd/${fd}" ] &&
      pkill -HUP --exact --parent=0 pgbackrest
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --dereference --format='Loaded configuration dated %y' "${filename}"
    elif
      { [ "${directory}" -nt "/proc/self/fd/${fd}" ] ||
        [ "${authority}" -nt "/proc/self/fd/${fd}" ]
      } &&
      pkill -HUP --exact --parent=0 pgbackrest
    then
      exec {fd}>&- && exec {fd}<> <(:)
      stat --format='Loaded certificates dated %y' "${directory}"
    fi
  done
  }; export directory="$1" authority="$2" filename="$3"; export -f monitor; exec -a "$0" bash -ceu monitor
  pgbackrest-config
  /etc/pgbackrest/server
  /etc/pgbackrest/conf.d/~postgres-operator/tls-ca.crt
  /etc/pgbackrest/conf.d/~postgres-operator_server.conf
State:          Running
  Started:      Thu, 28 Apr 2022 13:19:16 +0000
Ready:          True
Restart Count:  0
Environment:    <none>
Mounts:
  /etc/pgbackrest/conf.d from pgbackrest-config (ro)
  /etc/pgbackrest/server from pgbackrest-server (ro)
  /tmp from tmp (rw)
  /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mk5w5 (ro)

Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: cert-volume: Type: Projected (a volume that contains injected data from multiple sources) SecretName: hippo-tls SecretOptionalName: 0xc00063add9 SecretName: hippo-replication-tls SecretOptionalName: postgres-data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: hippo-instance1-7sr7-pgdata ReadOnly: false database-containerinfo: Type: DownwardAPI (a volume populated by information about the pod) Items: limits.cpu -> cpu_limit requests.cpu -> cpu_request limits.memory -> mem_limit requests.memory -> mem_request metadata.labels -> labels metadata.annotations -> annotations pgbackrest-server: Type: Projected (a volume that contains injected data from multiple sources) SecretName: hippo-instance1-7sr7-certs SecretOptionalName: pgbackrest-config: Type: Projected (a volume that contains injected data from multiple sources) SecretName: pgo-s3-creds SecretOptionalName: ConfigMapName: hippo-pgbackrest-config ConfigMapOptional: SecretName: hippo-pgbackrest SecretOptionalName: 0xc00063aff0 patroni-config: Type: Projected (a volume that contains injected data from multiple sources) ConfigMapName: hippo-config ConfigMapOptional: ConfigMapName: hippo-instance1-7sr7-config ConfigMapOptional: SecretName: hippo-instance1-7sr7-certs SecretOptionalName: tmp: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: 16Mi dshm: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: kube-api-access-mk5w5: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: DownwardAPI: true QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Scheduled 39m default-scheduler Successfully assigned pgo/hippo-instance1-7sr7-0 to combartekbroniewski-slot4 Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1" already present on machine Normal Created 38m kubelet Created container postgres-startup Normal Started 38m kubelet Started container postgres-startup Normal Started 38m kubelet Started container nss-wrapper-init Normal Created 38m kubelet Created container nss-wrapper-init Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1" already present on machine Normal Created 38m kubelet Created container replication-cert-copy Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1" already present on machine Normal Created 38m kubelet Created container database Normal Started 38m kubelet Started container database Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1" already present on machine Normal Started 38m kubelet Started container replication-cert-copy Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0" already present on machine Normal Created 38m kubelet Created container pgbackrest Normal Started 38m kubelet Started container pgbackrest Normal Pulled 38m kubelet Container image "registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0" already present on machine Normal Created 38m kubelet Created container pgbackrest-config Normal Started 38m kubelet Started container pgbackrest-config Warning Unhealthy 3m46s (x241 over 38m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503

Additional Information

Please provide any additional information that may be helpful.

bbroniewski commented 2 years ago

Cluster CRD:

apiVersion: v1
items:
- apiVersion: postgres-operator.crunchydata.com/v1beta1
  kind: PostgresCluster
  metadata:
    annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
        {"apiVersion":"postgres-operator.crunchydata.com/v1beta1","kind":"PostgresCluster","metadata":{"annotations":{},"name":"hippo","namespace":"pgo"},"spec":{"backups":{"pgbackrest":{"image":"registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:centos8-2.36-1","repos":[{"name":"repo1","volume":{"volumeClaimSpec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}}}}}]}},"customTLSSecret":{"name":"hippo-tls"},"image":"registry.developers.crunchydata.com/crunchydata/crunchy-postgres:centos8-14.2-0","instances":[{"dataVolumeClaimSpec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"1Gi"}}},"name":"instance1","replicas":2}],"postgresVersion":14}}
    creationTimestamp: "2022-03-22T01:26:11Z"
    finalizers:
    - postgres-operator.crunchydata.com/finalizer
    generation: 44
    name: hippo
    namespace: pgo
    resourceVersion: "103210973"
    uid: 4ff648ca-9f81-4f64-aeca-a30fd586f2b0
  spec:
    backups:
      pgbackrest:
        configuration:
        - secret:
            name: pgo-s3-creds
        global:
          repo1-retention-full: "7"
          repo1-retention-full-type: time
          repo2-host-cert-file: /run/secrets/kubernetes.io/serviceaccount/ca.crt
          repo2-path: /pgo/hippo/repo2
          repo2-retention-full: "14"
          repo2-retention-full-type: time
          repo2-s3-uri-style: path
          repo2-storage-verify-tls: "y"
        image: registry.developers.crunchydata.com/crunchydata/crunchy-pgbackrest:ubi8-2.38-0
        manual:
          options:
          - --type=full
          repoName: repo2
        repos:
        - name: repo1
          schedules:
            full: 10 3 * * *
          volume:
            volumeClaimSpec:
              accessModes:
              - ReadWriteOnce
              resources:
                requests:
                  storage: 1Gi
        - name: repo2
          s3:
            bucket: translison-db-backup
            endpoint: <wiped out>
            region: pl-rack-1
          schedules:
            full: 0 3 * * *
            incremental: 0 */1 * * *
    customReplicationTLSSecret:
      name: hippo-replication-tls
      optional: false
    customTLSSecret:
      name: hippo-tls
      optional: false
    image: registry.developers.crunchydata.com/crunchydata/crunchy-postgres:ubi8-14.2-1
    instances:
    - affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                postgres-operator.crunchydata.com/cluster: hippo
                postgres-operator.crunchydata.com/instance-set: instance1
            topologyKey: kubernetes.io/hostname
      dataVolumeClaimSpec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      minAvailable: 2
      name: instance1
      replicas: 3
    port: 5432
    postgresVersion: 14
    users:
    - name: postgres
  status:
    conditions:
    - lastTransitionTime: "2022-04-04T14:42:31Z"
      message: pgBackRest replica create repo is ready for backups
      observedGeneration: 44
      reason: StanzaCreated
      status: "True"
      type: PGBackRestReplicaRepoReady
    - lastTransitionTime: "2022-04-04T14:44:29Z"
      message: pgBackRest replica creation is now possible
      observedGeneration: 44
      reason: RepoBackupComplete
      status: "True"
      type: PGBackRestReplicaCreate
    - lastTransitionTime: "2022-04-26T12:21:39Z"
      message: pgBackRest dedicated repository host is ready
      observedGeneration: 44
      reason: RepoHostReady
      status: "True"
      type: PGBackRestRepoHostReady
    - lastTransitionTime: "2022-04-05T12:39:25Z"
      message: Manual backup completed successfully
      observedGeneration: 40
      reason: ManualBackupComplete
      status: "True"
      type: PGBackRestManualBackupSuccessful
    databaseRevision: 559678bf8f
    instances:
    - name: instance1
      readyReplicas: 1
      replicas: 3
      updatedReplicas: 2
    monitoring:
      exporterConfiguration: 559c4c97d6
    observedGeneration: 44
    patroni:
      systemIdentifier: "7077734709681266763"
    pgbackrest:
      manualBackup:
        completionTime: "2022-04-05T12:39:24Z"
        finished: true
        id: "3"
        startTime: "2022-04-05T12:36:00Z"
        succeeded: 1
      repoHost:
        apiVersion: apps/v1
        kind: StatefulSet
        ready: true
      repos:
      - bound: true
        name: repo1
        replicaCreateBackupComplete: true
        stanzaCreated: true
        volume: pvc-4c44805c-b618-408c-9a78-ba77695206dd
      - name: repo2
        repoOptionsHash: 7549974f45
        stanzaCreated: true
      scheduledBackups:
      - completionTime: "2022-04-27T03:11:25Z"
        cronJobName: hippo-pgbackrest-repo1-full
        repo: repo1
        startTime: "2022-04-27T03:10:00Z"
        succeeded: 1
        type: full
      - completionTime: "2022-04-28T03:11:00Z"
        cronJobName: hippo-pgbackrest-repo1-full
        repo: repo1
        startTime: "2022-04-28T03:10:00Z"
        succeeded: 1
        type: full
      - completionTime: "2022-04-27T03:05:12Z"
        cronJobName: hippo-pgbackrest-repo2-full
        failed: 2
        repo: repo2
        startTime: "2022-04-27T03:00:00Z"
        succeeded: 1
        type: full
      - completionTime: "2022-04-28T03:05:11Z"
        cronJobName: hippo-pgbackrest-repo2-full
        failed: 3
        repo: repo2
        startTime: "2022-04-28T03:00:00Z"
        succeeded: 1
        type: full
      - completionTime: "2022-04-28T13:00:16Z"
        cronJobName: hippo-pgbackrest-repo2-incr
        repo: repo2
        startTime: "2022-04-28T13:00:00Z"
        succeeded: 1
        type: incr
      - completionTime: "2022-04-28T14:00:17Z"
        cronJobName: hippo-pgbackrest-repo2-incr
        repo: repo2
        startTime: "2022-04-28T14:00:00Z"
        succeeded: 1
        type: incr
      - completionTime: "2022-04-28T15:00:19Z"
        cronJobName: hippo-pgbackrest-repo2-incr
        repo: repo2
        startTime: "2022-04-28T15:00:00Z"
        succeeded: 1
        type: incr
    proxy:
      pgBouncer:
        postgresRevision: 5c9966f6bc
    usersRevision: 786cb8ff8c
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Certificates:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: hippo-replication-tls
  namespace: pgo
spec:
  secretName: hippo-replication-tls
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: hippo-primary
  dnsNames:
  - hippo-primary.pgo.svc
  - hippo-ha.pgo.svc
  - hippo-replicas.pgo.svc
  issuerRef:
    name: local-ca-issuer
    kind: ClusterIssuer
    group: cert-manager.io
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: hippo-tls
  namespace: pgo
spec:
  secretName: hippo-tls
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: hippo-primary
  dnsNames:
  - hippo-primary.pgo.svc
  - hippo-ha.pgo.svc
  issuerRef:
    name: local-ca-issuer
    kind: ClusterIssuer
    group: cert-manager.io
bbroniewski commented 2 years ago

I think this is not documented, but commonName of the replication certificate has to be "_crunchyrepl".

tjmoore4 commented 2 years ago

@bbroniewski Thank you for following up, our team will plan to look into improving the documentation related to manual certificate creation. Just to be clear, with the updated common name, is everything now working as expected?

bbroniewski commented 2 years ago

@tjmoore4 yes, but upgrade never ends, so I killed it finally. Certificates, after entioned correction works good.

andrewlecuyer commented 2 years ago

@bbroniewski glad to hear all is now working with your certificates.

As @tjmoore4 mentioned, we plan to update our documentation to better define any requirements for custom generated certs.

Thank you for your feedback and thanks for using PGO!

benjaminjb commented 1 year ago

Thanks for your help noting the gap in the docs, @bbroniewski -- we've merged in a change to fix that, so I'm going to close this ticket now. If you run into any other bumps, please let us know!

conbrad commented 11 months ago

Thanks for your help noting the gap in the docs, @bbroniewski -- we've merged in a change to fix that, so I'm going to close this ticket now. If you run into any other bumps, please let us know!

Where in the docs? I'm having this same issue.