Open mausch opened 11 months ago
Hello @mausch!
I have not been able to reproduce the behavior you are seeing... Are you waiting for the initial backup to complete before scaling down? How are you scaling the cluster down to 0 instances? What version of PGO are you using (the image tag you listed is for a postgres image)? Can you send your postgrescluster spec? Can you reproduce the behavior again and send the resulting PGO logs?
Are you waiting for the initial backup to complete before scaling down?
No idea how to check this, or why it would matter tbh 🙂
How are you scaling the cluster down to 0 instances?
kubernetes dashboard or kubectl scale
, doesn't matter.
What version of PGO are you using (the image tag you listed is for a postgres image)? Can you reproduce the behavior again and send the resulting PGO logs?
Already include in my initial message
Can you send your postgrescluster spec?
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
creationTimestamp: '2024-01-30T09:27:28Z'
finalizers:
- postgres-operator.crunchydata.com/finalizer
generation: 1
name: postgres
namespace: postgres
resourceVersion: '376673'
uid: f375cd59-775f-40c3-a995-10851ecb7bcf
spec:
backups:
pgbackrest:
global:
archive-async: 'y'
archive-push-queue-max: 1GiB
repo1-retention-full: '31'
repo1-retention-full-type: time
spool-path: /pgdata/backups
repos:
- name: repo1
schedules:
full: 0 1 * * *
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
image: >-
registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.0-3.4-0
instances:
- dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 300Gi
storageClassName: gp3-fast
name: instance1
replicas: 1
resources:
requests:
cpu: '0.3'
memory: 1Gi
patroni:
dynamicConfiguration:
postgresql:
parameters:
maintenance_work_mem: 256MB
max_connections: '1000'
shared_buffers: 1GB
leaderLeaseDurationSeconds: 30
port: 8008
syncPeriodSeconds: 10
port: 5432
postGISVersion: 3.4.0
postgresVersion: 16
users:
- name: elevate
options: SUPERUSER
- name: postgres
status:
conditions:
- lastTransitionTime: '2024-01-30T13:55:47Z'
message: pgBackRest dedicated repository host is ready
observedGeneration: 1
reason: RepoHostReady
status: 'True'
type: PGBackRestRepoHostReady
- lastTransitionTime: '2024-01-30T09:27:34Z'
message: pgBackRest replica create repo is not ready for backups
observedGeneration: 1
reason: StanzaNotCreated
status: 'False'
type: PGBackRestReplicaRepoReady
- lastTransitionTime: '2024-01-30T09:27:34Z'
message: pgBackRest replica creation is not currently possible
observedGeneration: 1
reason: RepoBackupNotComplete
status: 'False'
type: PGBackRestReplicaCreate
databaseRevision: cb44c7ccf
instances:
- name: instance1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
monitoring:
exporterConfiguration: 559c4c97d6
observedGeneration: 1
patroni:
systemIdentifier: '7329820703120699482'
pgbackrest:
repoHost:
apiVersion: apps/v1
kind: StatefulSet
ready: true
repos:
- bound: true
name: repo1
stanzaCreated: false
volume: pvc-45bfd009-8b7f-442e-af31-45eaa61d416d
proxy:
pgBouncer:
postgresRevision: 5c9966f6bc
usersRevision: 84d5fddb9c
@mausch just to clarify - when you say you are scaling down to 0, what exactly are you scaling down to 0? The StatefulSet for the PG instance? What is the purpose for doing so? As you said, PGO will scale it back to 1 when you do so.
This appears to be forcing PG into recovery (which is clear based on the logs you provided), which could be the reason you're seeing a slower startup time. What behavior do you see when you do not scale down to 0, effectively allowing for a normal startup? Startup time should be very fast for a brand new cluster (e.g. no need to wait for initial backup or anything like that before connecting - you can connect when the initial primary/leader is ready), assuming you allow a clean bootstrap & start.
I'll also note that we are leveraging the HA system for readiness: https://patroni.readthedocs.io/en/latest/rest_api.html
GET /readiness: returns HTTP status code 200 when the Patroni node is running as the leader or when PostgreSQL is up and running. The endpoint could be used for readinessProbe when it is not possible to use Kubernetes endpoints for leader elections (OpenShift).
Apologies for the very late reply!
when you say you are scaling down to 0, what exactly are you scaling down to 0? The StatefulSet for the PG instance?
Yes
What is the purpose for doing so?
Simulating a crash or any other operations that replaces the pod e.g. bumping up memory limits.
Startup time should be very fast for a brand new cluster
This takes several minutes though. I just had a case of this where I bumped up memory limits and it took 4 minutes to start up. More importantly the reported readiness is incorrect during this time.
Since this is still an issue please reopen.
@mausch per my comment above, it sounds like you are simply dealing the typical Postgres recovery process. And I'll also note too that this is the exact reason for CPK's HA functionality.
In other words, if you want your database to continue to be available for writes immediately following the failure of the current primary, you should simply add another replica. Focusing on the startup time here seems misguided, since you're simply relying on Postgres to go through its typical recovery process (and again, this is the exact reason additional replicas can be added to create a highly-available cluster).
And as for your note around readiness, this is again likely tied to the recovery process. More specifically, during this time (i.e., during recovery) Postgres will likely be accepting connections as a hot standby. Therefore, if you attempt to connect to the DB during this time, you'll likely find that you are able to do so, only via a read-only connection (since the database will still be in recovery). So again, this also sounds normal.
Additionally, going through this thread I did see that the one thing we haven't taken a look at are the Postgres logs. More specifically, the Postgres logs at the specific time in which you're seeing Patroni log the rejecting connections
messages. This should provide insight into exactly what is going on at this time to tie together the behavior you are seeing.
Thanks for the quick reply!
If I understand correctly, this means (please correct me if I'm wrong):
Additionally, going through this thread I did see that the one thing we haven't taken a look at are the Postgres logs. More specifically, the Postgres logs at the specific time in which you're seeing Patroni log the rejecting connections messages. This should provide insight into exactly what is going on at this time to tie together the behavior you are seeing.
Here are the logs after a pod restart:
2024-07-16T08:25:59.117Z | 2024-07-16 08:25:59,116 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-07-16T08:25:59.151Z | 2024-07-16 08:25:59,146 WARNING: Postgresql is not running.
2024-07-16T08:25:59.206Z | 2024-07-16 08:25:59,146 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:25:59.206Z | 2024-07-16 08:25:59,202 INFO: pg_controldata:
2024-07-16T08:25:59.206Z | pg_control version number: 1300
2024-07-16T08:25:59.206Z | Catalog version number: 202307071
2024-07-16T08:25:59.206Z | Database system identifier: 7390347664152133712
2024-07-16T08:25:59.206Z | Database cluster state: shut down
2024-07-16T08:25:59.206Z | pg_control last modified: Tue Jul 16 08:25:20 2024
2024-07-16T08:25:59.206Z | Latest checkpoint location: E8/3E000028
2024-07-16T08:25:59.206Z | Latest checkpoint's REDO location: E8/3E000028
2024-07-16T08:25:59.206Z | Latest checkpoint's REDO WAL file: 00000007000000E80000003E
2024-07-16T08:25:59.206Z | Latest checkpoint's TimeLineID: 7
2024-07-16T08:25:59.206Z | Latest checkpoint's PrevTimeLineID: 7
2024-07-16T08:25:59.206Z | Latest checkpoint's full_page_writes: on
2024-07-16T08:25:59.206Z | Latest checkpoint's NextXID: 0:435335
2024-07-16T08:25:59.206Z | Latest checkpoint's NextOID: 27955417
2024-07-16T08:25:59.206Z | Latest checkpoint's NextMultiXactId: 1
2024-07-16T08:25:59.206Z | Latest checkpoint's NextMultiOffset: 0
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestXID: 722
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestXID's DB: 1
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestActiveXID: 0
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestMultiXid: 1
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestMulti's DB: 1
2024-07-16T08:25:59.206Z | Latest checkpoint's oldestCommitTsXid: 0
2024-07-16T08:25:59.206Z | Latest checkpoint's newestCommitTsXid: 0
2024-07-16T08:25:59.206Z | Time of latest checkpoint: Tue Jul 16 08:25:20 2024
2024-07-16T08:25:59.206Z | Fake LSN counter for unlogged rels: 0/3E8
2024-07-16T08:25:59.206Z | Minimum recovery ending location: 0/0
2024-07-16T08:25:59.206Z | Min recovery ending loc's timeline: 0
2024-07-16T08:25:59.206Z | Backup start location: 0/0
2024-07-16T08:25:59.206Z | Backup end location: 0/0
2024-07-16T08:25:59.206Z | End-of-backup record required: no
2024-07-16T08:25:59.206Z | wal_level setting: logical
2024-07-16T08:25:59.206Z | wal_log_hints setting: on
2024-07-16T08:25:59.206Z | max_connections setting: 1000
2024-07-16T08:25:59.206Z | max_worker_processes setting: 8
2024-07-16T08:25:59.206Z | max_wal_senders setting: 10
2024-07-16T08:25:59.206Z | max_prepared_xacts setting: 0
2024-07-16T08:25:59.206Z | max_locks_per_xact setting: 64
2024-07-16T08:25:59.206Z | track_commit_timestamp setting: off
2024-07-16T08:25:59.206Z | Maximum data alignment: 8
2024-07-16T08:25:59.206Z | Database block size: 8192
2024-07-16T08:25:59.206Z | Blocks per segment of large relation: 131072
2024-07-16T08:25:59.206Z | WAL block size: 8192
2024-07-16T08:25:59.206Z | Bytes per WAL segment: 16777216
2024-07-16T08:25:59.206Z | Maximum length of identifiers: 64
2024-07-16T08:25:59.206Z | Maximum columns in an index: 32
2024-07-16T08:25:59.206Z | Maximum size of a TOAST chunk: 1996
2024-07-16T08:25:59.206Z | Size of a large-object chunk: 2048
2024-07-16T08:25:59.206Z | Date/time type storage: 64-bit integers
2024-07-16T08:25:59.206Z | Float8 argument passing: by value
2024-07-16T08:25:59.206Z | Data page checksum version: 1
2024-07-16T08:25:59.206Z | Mock authentication nonce: e0e6339eb25f46a19d712a22b6161ed9e5b884668754a21b602d2de896bbf0f1
2024-07-16T08:25:59.206Z |
2024-07-16T08:25:59.389Z | 2024-07-16 08:25:59,204 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:25:59.389Z | 2024-07-16 08:25:59,388 INFO: starting as readonly because i had the session lock
2024-07-16T08:26:00.838Z | 2024-07-16 08:26:00.838 UTC [140] LOG: pgaudit extension initialized
2024-07-16T08:26:00.859Z | 2024-07-16 08:26:00,850 INFO: postmaster pid=140
2024-07-16T08:26:00.884Z | /tmp/postgres:5432 - no response
2024-07-16T08:26:00.950Z | 2024-07-16 08:26:00.950 UTC [140] LOG: redirecting log output to logging collector process
2024-07-16T08:26:00.950Z | 2024-07-16 08:26:00.950 UTC [140] HINT: Future log output will appear in directory "log".
2024-07-16T08:26:01.900Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:01.916Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:02.931Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:03.958Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:04.970Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:05.983Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:06.997Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:08.009Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:09.024Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:09.298Z | 2024-07-16 08:26:09,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:09.298Z | 2024-07-16 08:26:09,297 INFO: updated leader lock during restarting after failure
2024-07-16T08:26:10.036Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:11.049Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:12.078Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:13.090Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:14.112Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:15.124Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:16.139Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:17.175Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:18.200Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:19.161Z | 2024-07-16 08:26:19,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:19.161Z | 2024-07-16 08:26:19,161 INFO: updated leader lock during restarting after failure
2024-07-16T08:26:19.214Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:20.230Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:21.246Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:22.266Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:23.276Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:24.290Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:25.302Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:26.332Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:27.346Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:28.357Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:29.212Z | 2024-07-16 08:26:29,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:29.212Z | 2024-07-16 08:26:29,212 INFO: updated leader lock during restarting after failure
...
2024-07-16T08:29:01.678Z | /tmp/postgres:5432 - accepting connections
2024-07-16T08:29:01.729Z | 2024-07-16 08:29:01,728 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:29:01.729Z | 2024-07-16 08:29:01,728 INFO: establishing a new patroni connection to the postgres cluster
2024-07-16T08:29:01.989Z | 2024-07-16 08:29:01,989 INFO: promoted self to leader because I had the session lock
2024-07-16T08:29:01.996Z | server promoting
2024-07-16T08:29:11.834Z | 2024-07-16 08:29:11,724 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:29:11.834Z | 2024-07-16 08:29:11,834 INFO: updated leader lock during promote
2024-07-16T08:29:21.793Z | 2024-07-16 08:29:21,792 INFO: no action. I am (postgres-instance1-q9tz-0), the leader with the lock
Here are the logs after a pod restart
@mausch those are actually the Patroni logs, which are different than the actual Postgres logs.
Can you provide the Postgres logs found under /pgdata/pg16/log
?
Oh, sorry about the confusion.
2024-07-16 14:13:12.520 UTC [140] LOG: received fast shutdown request
2024-07-16 14:13:12.526 UTC [140] LOG: aborting any active transactions
2024-07-16 14:13:12.528 UTC [1794] FATAL: terminating connection due to administrator command
2024-07-16 14:13:12.536 UTC [140] LOG: background worker "logical replication launcher" (PID 3435) exited with exit code 1
2024-07-16 14:13:12.539 UTC [143] LOG: shutting down
2024-07-16 14:13:12.544 UTC [143] LOG: checkpoint starting: shutdown immediate
2024-07-16 14:13:12.567 UTC [143] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=16383 kB; lsn=E8/3F000028, redo lsn=E8/3F000028
2024-07-16 14:13:56.790 UTC [138] LOG: starting PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-21), 64-bit
2024-07-16 14:13:56.791 UTC [138] LOG: listening on IPv4 address "0.0.0.0", port 5432
2024-07-16 14:13:56.791 UTC [138] LOG: listening on IPv6 address "::", port 5432
2024-07-16 14:13:56.799 UTC [138] LOG: listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2024-07-16 14:13:56.873 UTC [144] LOG: database system was shut down at 2024-07-16 14:13:12 UTC
2024-07-16 14:13:57.737 UTC [158] FATAL: the database system is starting up
...
2024-07-16 14:14:56.905 UTC [144] LOG: entering standby mode
2024-07-16 14:14:57.675 UTC [697] FATAL: the database system is starting up
...
2024-07-16 14:18:59.000 UTC [144] LOG: received promote request
2024-07-16 14:18:59.000 UTC [144] LOG: redo is not required
2024-07-16 14:19:59.155 UTC [144] LOG: selected new timeline ID: 9
2024-07-16 14:20:59.240 UTC [144] LOG: archive recovery complete
2024-07-16 14:20:59.271 UTC [142] LOG: checkpoint starting: force
2024-07-16 14:20:59.276 UTC [138] LOG: database system is ready to accept connections
2024-07-16 14:20:59.294 UTC [142] LOG: checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.002 s, total=0.024 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=E8/3F000108, redo lsn=E8/3F0000D0
Overview
Create a trivial PostgresCluster with a single instance. Scale it down to 0. Scale it back up to 1 (actually I think the operator automatically brings it back to 1). Logs show that the instance takes 5 minutes to actually be ready. During those 5 minutes the pod reports as ready and live, so applications try to connect to it. Because it's not actually ready all connections fail.
So the issues here are:
Logs:
Environment
Please provide the following details: