Postgres takes 5 minutes to start up and incorrectly reports readiness

mausch commented 11 months ago

Overview

Create a trivial PostgresCluster with a single instance. Scale it down to 0. Scale it back up to 1 (actually I think the operator automatically brings it back to 1). Logs show that the instance takes 5 minutes to actually be ready. During those 5 minutes the pod reports as ready and live, so applications try to connect to it. Because it's not actually ready all connections fail.

So the issues here are:

5 minutes is an inordinate amount of time to start Postgres. A regular Postgres instance takes less than 10 seconds to start. I understand this is probably due to leader election?
Readiness/liveness seem incorrect ( see related issue https://github.com/CrunchyData/postgres-operator/issues/3788 )

Logs:

2023-12-05 08:21:11,373 INFO: No PostgreSQL configuration items changed, nothing to reload.
2023-12-05 08:21:11,385 WARNING: Postgresql is not running.
2023-12-05 08:21:11,385 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,436 INFO: pg_controldata:
  pg_control version number: 1300
  Catalog version number: 202307071
  Database system identifier: 7307677244695171151
  Database cluster state: shut down
  pg_control last modified: Mon Dec  4 17:12:57 2023
  Latest checkpoint location: 36/59000028
  Latest checkpoint's REDO location: 36/59000028
  Latest checkpoint's REDO WAL file: 000000040000003600000059
  Latest checkpoint's TimeLineID: 4
  Latest checkpoint's PrevTimeLineID: 4
  Latest checkpoint's full_page_writes: on
  Latest checkpoint's NextXID: 0:13401
  Latest checkpoint's NextOID: 4996957
  Latest checkpoint's NextMultiXactId: 1
  Latest checkpoint's NextMultiOffset: 0
  Latest checkpoint's oldestXID: 722
  Latest checkpoint's oldestXID's DB: 1
  Latest checkpoint's oldestActiveXID: 0
  Latest checkpoint's oldestMultiXid: 1
  Latest checkpoint's oldestMulti's DB: 1
  Latest checkpoint's oldestCommitTsXid: 0
  Latest checkpoint's newestCommitTsXid: 0
  Time of latest checkpoint: Mon Dec  4 17:12:57 2023
  Fake LSN counter for unlogged rels: 0/3E8
  Minimum recovery ending location: 0/0
  Min recovery ending loc's timeline: 0
  Backup start location: 0/0
  Backup end location: 0/0
  End-of-backup record required: no
  wal_level setting: logical
  wal_log_hints setting: on
  max_connections setting: 1000
  max_worker_processes setting: 8
  max_wal_senders setting: 10
  max_prepared_xacts setting: 0
  max_locks_per_xact setting: 64
  track_commit_timestamp setting: off
  Maximum data alignment: 8
  Database block size: 8192
  Blocks per segment of large relation: 131072
  WAL block size: 8192
  Bytes per WAL segment: 16777216
  Maximum length of identifiers: 64
  Maximum columns in an index: 32
  Maximum size of a TOAST chunk: 1996
  Size of a large-object chunk: 2048
  Date/time type storage: 64-bit integers
  Float8 argument passing: by value
  Data page checksum version: 1
  Mock authentication nonce: 30f36bb6d2b271d381adba6cd1fcdbbd15c6e243a7f735f84e6675fe1eef2a16
2023-12-05 08:21:11,438 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:11,639 INFO: starting as readonly because i had the session lock
2023-12-05 08:21:13.034 UTC [122] LOG:  pgaudit extension initialized
2023-12-05 08:21:13,066 INFO: postmaster pid=122
/tmp/postgres:5432 - no response
2023-12-05 08:21:13.116 UTC [122] LOG:  redirecting log output to logging collector process
2023-12-05 08:21:13.116 UTC [122] HINT:  Future log output will appear in directory "log".
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:21:21,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:21:21,508 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
[...keeps repeating...]
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - rejecting connections
2023-12-05 08:26:11,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:11,441 INFO: updated leader lock during restarting after failure
/tmp/postgres:5432 - rejecting connections
/tmp/postgres:5432 - accepting connections
2023-12-05 08:26:21,422 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:21,422 INFO: establishing a new patroni connection to the postgres cluster
2023-12-05 08:26:21,700 INFO: promoted self to leader because I had the session lock
server promoting
2023-12-05 08:26:31,382 INFO: Lock owner: postgres-instance1-8mfm-0; I am postgres-instance1-8mfm-0
2023-12-05 08:26:31,484 INFO: updated leader lock during promote
2023-12-05 08:26:41,438 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock
2023-12-05 08:26:51,439 INFO: no action. I am (postgres-instance1-8mfm-0), the leader with the lock

Environment

Please provide the following details:

Platform: k3s
Platform Version: 1.28
PGO Image Tag: ubi8-16.0-3.4-0
Postgres Version: 16
Storage: EBS

dsessler7 commented 9 months ago

Hello @mausch!

I have not been able to reproduce the behavior you are seeing... Are you waiting for the initial backup to complete before scaling down? How are you scaling the cluster down to 0 instances? What version of PGO are you using (the image tag you listed is for a postgres image)? Can you send your postgrescluster spec? Can you reproduce the behavior again and send the resulting PGO logs?

mausch commented 9 months ago

Are you waiting for the initial backup to complete before scaling down?

No idea how to check this, or why it would matter tbh 🙂

How are you scaling the cluster down to 0 instances?

kubernetes dashboard or kubectl scale, doesn't matter.

What version of PGO are you using (the image tag you listed is for a postgres image)? Can you reproduce the behavior again and send the resulting PGO logs?

Already include in my initial message

Can you send your postgrescluster spec?

apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  creationTimestamp: '2024-01-30T09:27:28Z'
  finalizers:
    - postgres-operator.crunchydata.com/finalizer
  generation: 1
  name: postgres
  namespace: postgres
  resourceVersion: '376673'
  uid: f375cd59-775f-40c3-a995-10851ecb7bcf
spec:
  backups:
    pgbackrest:
      global:
        archive-async: 'y'
        archive-push-queue-max: 1GiB
        repo1-retention-full: '31'
        repo1-retention-full-type: time
        spool-path: /pgdata/backups
      repos:
        - name: repo1
          schedules:
            full: 0 1 * * *
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 20Gi
  image: >-
    registry.developers.crunchydata.com/crunchydata/crunchy-postgres-gis:ubi8-16.0-3.4-0
  instances:
    - dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 300Gi
        storageClassName: gp3-fast
      name: instance1
      replicas: 1
      resources:
        requests:
          cpu: '0.3'
          memory: 1Gi
  patroni:
    dynamicConfiguration:
      postgresql:
        parameters:
          maintenance_work_mem: 256MB
          max_connections: '1000'
          shared_buffers: 1GB
    leaderLeaseDurationSeconds: 30
    port: 8008
    syncPeriodSeconds: 10
  port: 5432
  postGISVersion: 3.4.0
  postgresVersion: 16
  users:
    - name: elevate
      options: SUPERUSER
    - name: postgres
status:
  conditions:
    - lastTransitionTime: '2024-01-30T13:55:47Z'
      message: pgBackRest dedicated repository host is ready
      observedGeneration: 1
      reason: RepoHostReady
      status: 'True'
      type: PGBackRestRepoHostReady
    - lastTransitionTime: '2024-01-30T09:27:34Z'
      message: pgBackRest replica create repo is not ready for backups
      observedGeneration: 1
      reason: StanzaNotCreated
      status: 'False'
      type: PGBackRestReplicaRepoReady
    - lastTransitionTime: '2024-01-30T09:27:34Z'
      message: pgBackRest replica creation is not currently possible
      observedGeneration: 1
      reason: RepoBackupNotComplete
      status: 'False'
      type: PGBackRestReplicaCreate
  databaseRevision: cb44c7ccf
  instances:
    - name: instance1
      readyReplicas: 1
      replicas: 1
      updatedReplicas: 1
  monitoring:
    exporterConfiguration: 559c4c97d6
  observedGeneration: 1
  patroni:
    systemIdentifier: '7329820703120699482'
  pgbackrest:
    repoHost:
      apiVersion: apps/v1
      kind: StatefulSet
      ready: true
    repos:
      - bound: true
        name: repo1
        stanzaCreated: false
        volume: pvc-45bfd009-8b7f-442e-af31-45eaa61d416d
  proxy:
    pgBouncer:
      postgresRevision: 5c9966f6bc
  usersRevision: 84d5fddb9c

andrewlecuyer commented 9 months ago

@mausch just to clarify - when you say you are scaling down to 0, what exactly are you scaling down to 0? The StatefulSet for the PG instance? What is the purpose for doing so? As you said, PGO will scale it back to 1 when you do so.

This appears to be forcing PG into recovery (which is clear based on the logs you provided), which could be the reason you're seeing a slower startup time. What behavior do you see when you do not scale down to 0, effectively allowing for a normal startup? Startup time should be very fast for a brand new cluster (e.g. no need to wait for initial backup or anything like that before connecting - you can connect when the initial primary/leader is ready), assuming you allow a clean bootstrap & start.

I'll also note that we are leveraging the HA system for readiness: https://patroni.readthedocs.io/en/latest/rest_api.html

GET /readiness: returns HTTP status code 200 when the Patroni node is running as the leader or when PostgreSQL is up and running. The endpoint could be used for readinessProbe when it is not possible to use Kubernetes endpoints for leader elections (OpenShift).

mausch commented 3 months ago

Apologies for the very late reply!

when you say you are scaling down to 0, what exactly are you scaling down to 0? The StatefulSet for the PG instance?

Yes

What is the purpose for doing so?

Simulating a crash or any other operations that replaces the pod e.g. bumping up memory limits.

Startup time should be very fast for a brand new cluster

This takes several minutes though. I just had a case of this where I bumped up memory limits and it took 4 minutes to start up. More importantly the reported readiness is incorrect during this time.

Since this is still an issue please reopen.

andrewlecuyer commented 3 months ago

@mausch per my comment above, it sounds like you are simply dealing the typical Postgres recovery process. And I'll also note too that this is the exact reason for CPK's HA functionality.

In other words, if you want your database to continue to be available for writes immediately following the failure of the current primary, you should simply add another replica. Focusing on the startup time here seems misguided, since you're simply relying on Postgres to go through its typical recovery process (and again, this is the exact reason additional replicas can be added to create a highly-available cluster).

And as for your note around readiness, this is again likely tied to the recovery process. More specifically, during this time (i.e., during recovery) Postgres will likely be accepting connections as a hot standby. Therefore, if you attempt to connect to the DB during this time, you'll likely find that you are able to do so, only via a read-only connection (since the database will still be in recovery). So again, this also sounds normal.

Additionally, going through this thread I did see that the one thing we haven't taken a look at are the Postgres logs. More specifically, the Postgres logs at the specific time in which you're seeing Patroni log the rejecting connections messages. This should provide insight into exactly what is going on at this time to tie together the behavior you are seeing.

mausch commented 3 months ago

Thanks for the quick reply!

If I understand correctly, this means (please correct me if I'm wrong):

This behaviour is by design
The CrunchyData Postgres operator is not designed to run without a standby (since it looks like it reports readiness incorrectly without one?)

Additionally, going through this thread I did see that the one thing we haven't taken a look at are the Postgres logs. More specifically, the Postgres logs at the specific time in which you're seeing Patroni log the rejecting connections messages. This should provide insight into exactly what is going on at this time to tie together the behavior you are seeing.

Here are the logs after a pod restart:

2024-07-16T08:25:59.117Z | 2024-07-16 08:25:59,116 INFO: No PostgreSQL configuration items changed, nothing to reload.
2024-07-16T08:25:59.151Z | 2024-07-16 08:25:59,146 WARNING: Postgresql is not running.
2024-07-16T08:25:59.206Z | 2024-07-16 08:25:59,146 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:25:59.206Z | 2024-07-16 08:25:59,202 INFO: pg_controldata:
2024-07-16T08:25:59.206Z |   pg_control version number: 1300
2024-07-16T08:25:59.206Z |   Catalog version number: 202307071
2024-07-16T08:25:59.206Z |   Database system identifier: 7390347664152133712
2024-07-16T08:25:59.206Z |   Database cluster state: shut down
2024-07-16T08:25:59.206Z |   pg_control last modified: Tue Jul 16 08:25:20 2024
2024-07-16T08:25:59.206Z |   Latest checkpoint location: E8/3E000028
2024-07-16T08:25:59.206Z |   Latest checkpoint's REDO location: E8/3E000028
2024-07-16T08:25:59.206Z |   Latest checkpoint's REDO WAL file: 00000007000000E80000003E
2024-07-16T08:25:59.206Z |   Latest checkpoint's TimeLineID: 7
2024-07-16T08:25:59.206Z |   Latest checkpoint's PrevTimeLineID: 7
2024-07-16T08:25:59.206Z |   Latest checkpoint's full_page_writes: on
2024-07-16T08:25:59.206Z |   Latest checkpoint's NextXID: 0:435335
2024-07-16T08:25:59.206Z |   Latest checkpoint's NextOID: 27955417
2024-07-16T08:25:59.206Z |   Latest checkpoint's NextMultiXactId: 1
2024-07-16T08:25:59.206Z |   Latest checkpoint's NextMultiOffset: 0
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestXID: 722
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestXID's DB: 1
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestActiveXID: 0
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestMultiXid: 1
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestMulti's DB: 1
2024-07-16T08:25:59.206Z |   Latest checkpoint's oldestCommitTsXid: 0
2024-07-16T08:25:59.206Z |   Latest checkpoint's newestCommitTsXid: 0
2024-07-16T08:25:59.206Z |   Time of latest checkpoint: Tue Jul 16 08:25:20 2024
2024-07-16T08:25:59.206Z |   Fake LSN counter for unlogged rels: 0/3E8
2024-07-16T08:25:59.206Z |   Minimum recovery ending location: 0/0
2024-07-16T08:25:59.206Z |   Min recovery ending loc's timeline: 0
2024-07-16T08:25:59.206Z |   Backup start location: 0/0
2024-07-16T08:25:59.206Z |   Backup end location: 0/0
2024-07-16T08:25:59.206Z |   End-of-backup record required: no
2024-07-16T08:25:59.206Z |   wal_level setting: logical
2024-07-16T08:25:59.206Z |   wal_log_hints setting: on
2024-07-16T08:25:59.206Z |   max_connections setting: 1000
2024-07-16T08:25:59.206Z |   max_worker_processes setting: 8
2024-07-16T08:25:59.206Z |   max_wal_senders setting: 10
2024-07-16T08:25:59.206Z |   max_prepared_xacts setting: 0
2024-07-16T08:25:59.206Z |   max_locks_per_xact setting: 64
2024-07-16T08:25:59.206Z |   track_commit_timestamp setting: off
2024-07-16T08:25:59.206Z |   Maximum data alignment: 8
2024-07-16T08:25:59.206Z |   Database block size: 8192
2024-07-16T08:25:59.206Z |   Blocks per segment of large relation: 131072
2024-07-16T08:25:59.206Z |   WAL block size: 8192
2024-07-16T08:25:59.206Z |   Bytes per WAL segment: 16777216
2024-07-16T08:25:59.206Z |   Maximum length of identifiers: 64
2024-07-16T08:25:59.206Z |   Maximum columns in an index: 32
2024-07-16T08:25:59.206Z |   Maximum size of a TOAST chunk: 1996
2024-07-16T08:25:59.206Z |   Size of a large-object chunk: 2048
2024-07-16T08:25:59.206Z |   Date/time type storage: 64-bit integers
2024-07-16T08:25:59.206Z |   Float8 argument passing: by value
2024-07-16T08:25:59.206Z |   Data page checksum version: 1
2024-07-16T08:25:59.206Z |   Mock authentication nonce: e0e6339eb25f46a19d712a22b6161ed9e5b884668754a21b602d2de896bbf0f1
2024-07-16T08:25:59.206Z | 
2024-07-16T08:25:59.389Z | 2024-07-16 08:25:59,204 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:25:59.389Z | 2024-07-16 08:25:59,388 INFO: starting as readonly because i had the session lock
2024-07-16T08:26:00.838Z | 2024-07-16 08:26:00.838 UTC [140] LOG:  pgaudit extension initialized
2024-07-16T08:26:00.859Z | 2024-07-16 08:26:00,850 INFO: postmaster pid=140
2024-07-16T08:26:00.884Z | /tmp/postgres:5432 - no response
2024-07-16T08:26:00.950Z | 2024-07-16 08:26:00.950 UTC [140] LOG:  redirecting log output to logging collector process
2024-07-16T08:26:00.950Z | 2024-07-16 08:26:00.950 UTC [140] HINT:  Future log output will appear in directory "log".
2024-07-16T08:26:01.900Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:01.916Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:02.931Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:03.958Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:04.970Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:05.983Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:06.997Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:08.009Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:09.024Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:09.298Z | 2024-07-16 08:26:09,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:09.298Z | 2024-07-16 08:26:09,297 INFO: updated leader lock during restarting after failure
2024-07-16T08:26:10.036Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:11.049Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:12.078Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:13.090Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:14.112Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:15.124Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:16.139Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:17.175Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:18.200Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:19.161Z | 2024-07-16 08:26:19,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:19.161Z | 2024-07-16 08:26:19,161 INFO: updated leader lock during restarting after failure
2024-07-16T08:26:19.214Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:20.230Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:21.246Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:22.266Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:23.276Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:24.290Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:25.302Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:26.332Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:27.346Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:28.357Z | /tmp/postgres:5432 - rejecting connections
2024-07-16T08:26:29.212Z | 2024-07-16 08:26:29,137 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:26:29.212Z | 2024-07-16 08:26:29,212 INFO: updated leader lock during restarting after failure
...
2024-07-16T08:29:01.678Z | /tmp/postgres:5432 - accepting connections
2024-07-16T08:29:01.729Z | 2024-07-16 08:29:01,728 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:29:01.729Z | 2024-07-16 08:29:01,728 INFO: establishing a new patroni connection to the postgres cluster
2024-07-16T08:29:01.989Z | 2024-07-16 08:29:01,989 INFO: promoted self to leader because I had the session lock
2024-07-16T08:29:01.996Z | server promoting
2024-07-16T08:29:11.834Z | 2024-07-16 08:29:11,724 INFO: Lock owner: postgres-instance1-q9tz-0; I am postgres-instance1-q9tz-0
2024-07-16T08:29:11.834Z | 2024-07-16 08:29:11,834 INFO: updated leader lock during promote
2024-07-16T08:29:21.793Z | 2024-07-16 08:29:21,792 INFO: no action. I am (postgres-instance1-q9tz-0), the leader with the lock

andrewlecuyer commented 3 months ago

Here are the logs after a pod restart

@mausch those are actually the Patroni logs, which are different than the actual Postgres logs.

Can you provide the Postgres logs found under /pgdata/pg16/log?

mausch commented 3 months ago

Oh, sorry about the confusion.

2024-07-16 14:13:12.520 UTC [140] LOG:  received fast shutdown request
2024-07-16 14:13:12.526 UTC [140] LOG:  aborting any active transactions
2024-07-16 14:13:12.528 UTC [1794] FATAL:  terminating connection due to administrator command
2024-07-16 14:13:12.536 UTC [140] LOG:  background worker "logical replication launcher" (PID 3435) exited with exit code 1
2024-07-16 14:13:12.539 UTC [143] LOG:  shutting down
2024-07-16 14:13:12.544 UTC [143] LOG:  checkpoint starting: shutdown immediate
2024-07-16 14:13:12.567 UTC [143] LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.027 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16383 kB, estimate=16383 kB; lsn=E8/3F000028, redo lsn=E8/3F000028
2024-07-16 14:13:56.790 UTC [138] LOG:  starting PostgreSQL 16.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-21), 64-bit
2024-07-16 14:13:56.791 UTC [138] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-07-16 14:13:56.791 UTC [138] LOG:  listening on IPv6 address "::", port 5432
2024-07-16 14:13:56.799 UTC [138] LOG:  listening on Unix socket "/tmp/postgres/.s.PGSQL.5432"
2024-07-16 14:13:56.873 UTC [144] LOG:  database system was shut down at 2024-07-16 14:13:12 UTC
2024-07-16 14:13:57.737 UTC [158] FATAL:  the database system is starting up
...
2024-07-16 14:14:56.905 UTC [144] LOG:  entering standby mode
2024-07-16 14:14:57.675 UTC [697] FATAL:  the database system is starting up
...
2024-07-16 14:18:59.000 UTC [144] LOG:  received promote request
2024-07-16 14:18:59.000 UTC [144] LOG:  redo is not required
2024-07-16 14:19:59.155 UTC [144] LOG:  selected new timeline ID: 9
2024-07-16 14:20:59.240 UTC [144] LOG:  archive recovery complete
2024-07-16 14:20:59.271 UTC [142] LOG:  checkpoint starting: force
2024-07-16 14:20:59.276 UTC [138] LOG:  database system is ready to accept connections
2024-07-16 14:20:59.294 UTC [142] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.002 s, total=0.024 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 kB, estimate=0 kB; lsn=E8/3F000108, redo lsn=E8/3F0000D0

CrunchyData / postgres-operator

Postgres takes 5 minutes to start up and incorrectly reports readiness #3798

Overview

Environment