canonical / postgresql-k8s-operator

A Charmed Operator for running PostgreSQL on Kubernetes
https://charmhub.io/postgresql-k8s
Apache License 2.0
9 stars 19 forks source link

Postgresql goes down due to "terminating connection due to administrator command" #616

Open cbartz opened 1 month ago

cbartz commented 1 month ago

Steps to reproduce

Deploy postgresql 14/edge on rev 239 with three units.

Model                        Controller                         Cloud/Region              Version  SLA          Timestamp
prod-repo-policy-compliance  juju-controller-35-production-ps6  k8s-prod-general/default  3.5.3    unsupported  08:07:59Z

App                       Version  Status  Scale  Charm                     Channel      Rev  Address        Exposed  Message
postgresql                14.12    active      3  postgresql-k8s            14/edge      239  10.87.6.161    no       Primary
repo-policy-compliance             active      1  repo-policy-compliance    latest/edge   17  10.87.26.0     no       

Unit                         Workload  Agent  Address          Ports  Message
postgresql/0                 active    idle   192.168.102.57          Primary
postgresql/1*                active    idle   192.168.100.212         
postgresql/2                 active    idle   192.168.103.49   
repo-policy-compliance/0*    active    idle   192.168.102.43          

Expected behavior

Postgreql is not going down.

Actual behavior

We received the Postgresql down alert on a production deployment with three units.

[[FIRING:2] (instance="prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1", job="juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0", juju_charm="postgresql-k8s", juju_unit="postgresql/1", severity="critical")](http://alertmanager-0.alertmanager-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9093/#/alerts?receiver=mattermost-notifications)
Alert: Postgresql down (instance prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1) - critical

Description: Postgresql instance is down
  VALUE = 0
  LABELS = map[name:pg_up instance:prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1 job:juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0 juju_application:postgresql juju_charm:postgresql-k8s juju_model:prod-repo-policy-compliance juju_model_uuid:e3901794-73c7-43f0-85ae-5179a6dc1b16 juju_unit:postgresql/1]

Details:
   • alertname: PostgresqlDown
   • instance: prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1
   • job: juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0
   • juju_application: postgresql
   • juju_charm: postgresql-k8s
   • juju_model: prod-repo-policy-compliance
   • juju_model_uuid: e3901794-73c7-43f0-85ae-5179a6dc1b16
   • juju_unit: postgresql/1
   • severity: critical

Source:  http://prometheus-0.prometheus-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9090/prod-cos-k8s-ps6-is-charms-prometheus-0/graph?g0.expr=pg_up%7Bjuju_application%3D%22postgresql%22%2Cjuju_model%3D%22prod-repo-policy-compliance%22%2Cjuju_model_uuid%3D%22e3901794-73c7-43f0-85ae-5179a6dc1b16%22%7D+%3D%3D+0&g0.tab=1
Alert: Postgresql exporter error (instance prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1) - critical

Description: Postgresql exporter is showing errors. A query may be buggy in query.yaml
  VALUE = 1
  LABELS = map[name:pg_exporter_last_scrape_error instance:prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1 job:juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0 juju_application:postgresql juju_charm:postgresql-k8s juju_model:prod-repo-policy-compliance juju_model_uuid:e3901794-73c7-43f0-85ae-5179a6dc1b16 juju_unit:postgresql/1]

Details:
   • alertname: PostgresqlExporterError
   • instance: prod-repo-policy-compliance_e3901794-73c7-43f0-85ae-5179a6dc1b16_postgresql_postgresql/1
   • job: juju_prod-repo-policy-compliance_e3901794_postgresql_prometheus_scrape-1_638a329e43b79c2385aeef8f96e0df8ed4864b1b4128b49c0b1215c36f0adff0
   • juju_application: postgresql
   • juju_charm: postgresql-k8s
   • juju_model: prod-repo-policy-compliance
   • juju_model_uuid: e3901794-73c7-43f0-85ae-5179a6dc1b16
   • juju_unit: postgresql/1
   • severity: critical

Source:  http://prometheus-0.prometheus-endpoints.prod-cos-k8s-ps6-is-charms.svc.cluster.local:9090/prod-cos-k8s-ps6-is-charms-prometheus-0/graph?g0.expr=pg_exporter_last_scrape_error%7Bjuju_application%3D%22postgresql%22%2Cjuju_model%3D%22prod-repo-policy-compliance%22%2Cjuju_model_uuid%3D%22e3901794-73c7-43f0-85ae-5179a6dc1b16%22%7D+%3E+0&g0.tab=1

Logs indicate that postgresql-1 was shutdown due to

2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down

It healed itself quite quickly.

Versions

Operating system:

Juju CLI: 3.5.3

Juju agent: 3.5.3

Charm revision: 239

microk8s:

Log output

juju debug log

unit-postgresql-0: 06:54:06 DEBUG unit.postgresql/0.juju-log ops 2.12.0 up and running.
unit-postgresql-0: 06:54:06 DEBUG unit.postgresql/0.juju-log Emitting Juju event update_status.
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:54:09 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:54:09 INFO unit.postgresql/0.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-0: 06:54:09 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-1: 06:54:24 DEBUG unit.postgresql/1.juju-log ops 2.12.0 up and running.
unit-postgresql-1: 06:54:24 DEBUG unit.postgresql/1.juju-log Emitting Juju event update_status.
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 06:54:25 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 06:54:25 INFO unit.postgresql/1.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-1: 06:54:25 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log ops 2.12.0 up and running.
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Emitting Juju event update_status.
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:46 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log Starting new HTTP connection (1): postgresql-2.postgresql-endpoints:8008
unit-postgresql-2: 06:56:47 DEBUG unit.postgresql/2.juju-log http://postgresql-2.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-2: 06:56:47 INFO unit.postgresql/2.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-2: 06:56:47 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-0: 06:59:40 DEBUG unit.postgresql/0.juju-log ops 2.12.0 up and running.
unit-postgresql-0: 06:59:40 DEBUG unit.postgresql/0.juju-log Emitting Juju event update_status.
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log Starting new HTTP connection (1): postgresql-0.postgresql-endpoints:8008
unit-postgresql-0: 06:59:42 DEBUG unit.postgresql/0.juju-log http://postgresql-0.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-0: 06:59:42 INFO unit.postgresql/0.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-0: 06:59:42 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log ops 2.12.0 up and running.
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Emitting Juju event update_status.
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /cluster HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log Starting new HTTP connection (1): postgresql-1.postgresql-endpoints:8008
unit-postgresql-1: 07:00:02 DEBUG unit.postgresql/1.juju-log http://postgresql-1.postgresql-endpoints:8008 "GET /health HTTP/1.1" 200 None
unit-postgresql-1: 07:00:02 INFO unit.postgresql/1.juju-log Kubernetes service 'postgresql' patched successfully
unit-postgresql-1: 07:00:03 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)

postgresql log

root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0658.log 
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=6 LOG:  checkpoint starting: shutdown immediate
2024-08-07 06:58:23 UTC [8765]: user=,db=,app=,client=,line=7 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.015 s, sync=0.001 s, total=0.242 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16382 kB, estimate=16382 kB
2024-08-07 06:58:23 UTC [17415]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17419]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17421]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:34 UTC [8758]: user=,db=,app=,client=,line=15 LOG:  database system is shut down
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=1 LOG:  database system was shut down at 2024-08-07 06:58:23 UTC
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:37 UTC [17583]: user=rewind,db=postgres,app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17584]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17586]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=5 LOG:  consistent recovery state reached at 0/2C0000A0
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=6 LOG:  invalid record length at 0/2C0000A0: wanted 24, got 0
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  no pg_hba.conf entry for replication connection from host "192.168.102.57", user "replication", no encryption
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=2 DETAIL:  Client IP address resolved to "192-168-102-57.postgresql-replicas.prod-repo-policy-compliance.svc.cluster.local", forward lookup not checked.
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=7 LOG:  received promote request
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=8 LOG:  redo is not required
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=9 LOG:  selected new timeline ID: 66
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=10 LOG:  archive recovery complete
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=1 LOG:  checkpoint starting: force
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17577]: user=,db=,app=,client=,line=7 LOG:  database system is ready to accept connections
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=2 LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.020 s, sync=0.003 s, total=0.067 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=8 LOG:  received SIGHUP, reloading configuration files
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=9 LOG:  parameter "timezone" changed to "Etc/UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=10 LOG:  parameter "TimeZone" changed to "UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=11 LOG:  parameter "synchronous_standby_names" changed to ""postgresql-2""
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=11 LOG:  standby "postgresql-2" is now a synchronous standby with priority 1
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=12 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=12 LOG:  received immediate shutdown request
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=13 LOG:  database system is shut down
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=1 LOG:  database system was interrupted; last known up at 2024-08-07 06:58:46 UTC
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=5 LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=6 LOG:  redo starts at 0/2C0000D0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=7 LOG:  invalid record length at 0/2C000218: wanted 24, got 0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=8 LOG:  consistent recovery state reached at 0/2C000218
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:59:00 UTC [17771]: user=,db=,app=,client=,line=7 LOG:  received SIGHUP, reloading configuration files
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0656.log 
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0655.log 
root@postgresql-1:/# cat /var/log/postgresql/postgresql-3_0658.log 
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=12 LOG:  received fast shutdown request
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=13 LOG:  aborting any active transactions
2024-08-07 06:58:22 UTC [8888]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8867]: user=monitoring,db=postgres,app=[unknown],client=[local],line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8769]: user=operator,db=postgres,app=Patroni,client=127.0.0.1,line=1 FATAL:  terminating connection due to administrator command
2024-08-07 06:58:22 UTC [8758]: user=,db=,app=,client=,line=14 LOG:  background worker "logical replication launcher" (PID 8783) exited with exit code 1
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=5 LOG:  shutting down
2024-08-07 06:58:22 UTC [8765]: user=,db=,app=,client=,line=6 LOG:  checkpoint starting: shutdown immediate
2024-08-07 06:58:23 UTC [8765]: user=,db=,app=,client=,line=7 LOG:  checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.015 s, sync=0.001 s, total=0.242 s; sync files=0, longest=0.000 s, average=0.000 s; distance=16382 kB, estimate=16382 kB
2024-08-07 06:58:23 UTC [17415]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17419]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:33 UTC [17421]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is shutting down
2024-08-07 06:58:34 UTC [8758]: user=,db=,app=,client=,line=15 LOG:  database system is shut down
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:35 UTC [17577]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=1 LOG:  database system was shut down at 2024-08-07 06:58:23 UTC
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:37 UTC [17583]: user=rewind,db=postgres,app=[unknown],client=192.168.102.57,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17584]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17586]: user=operator,db=postgres,app=[unknown],client=127.0.0.1,line=1 FATAL:  the database system is starting up
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=5 LOG:  consistent recovery state reached at 0/2C0000A0
2024-08-07 06:58:37 UTC [17582]: user=,db=,app=,client=,line=6 LOG:  invalid record length at 0/2C0000A0: wanted 24, got 0
2024-08-07 06:58:37 UTC [17577]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=1 FATAL:  no pg_hba.conf entry for replication connection from host "192.168.102.57", user "replication", no encryption
2024-08-07 06:58:38 UTC [17590]: user=replication,db=[unknown],app=[unknown],client=192.168.102.57,line=2 DETAIL:  Client IP address resolved to "192-168-102-57.postgresql-replicas.prod-repo-policy-compliance.svc.cluster.local", forward lookup not checked.
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:39 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=3 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:43 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=4 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 65
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:45 UTC [17597]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17599]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=7 LOG:  received promote request
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=8 LOG:  redo is not required
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=9 LOG:  selected new timeline ID: 66
2024-08-07 06:58:46 UTC [17582]: user=,db=,app=,client=,line=10 LOG:  archive recovery complete
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=1 LOG:  checkpoint starting: force
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=5 LOG:  received replication command: TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=6 STATEMENT:  TIMELINE_HISTORY 66
2024-08-07 06:58:46 UTC [17577]: user=,db=,app=,client=,line=7 LOG:  database system is ready to accept connections
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=7 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=8 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17594]: user=replication,db=[unknown],app=postgresql-0,client=192.168.102.57,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_0" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=9 LOG:  received replication command: START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=10 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:46 UTC [17587]: user=,db=,app=,client=,line=2 LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.020 s, sync=0.003 s, total=0.067 s; sync files=2, longest=0.002 s, average=0.002 s; distance=0 kB, estimate=0 kB
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=8 LOG:  received SIGHUP, reloading configuration files
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=9 LOG:  parameter "timezone" changed to "Etc/UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=10 LOG:  parameter "TimeZone" changed to "UTC"
2024-08-07 06:58:47 UTC [17577]: user=,db=,app=,client=,line=11 LOG:  parameter "synchronous_standby_names" changed to ""postgresql-2""
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=11 LOG:  standby "postgresql-2" is now a synchronous standby with priority 1
2024-08-07 06:58:47 UTC [17593]: user=replication,db=[unknown],app=postgresql-2,client=192.168.103.49,line=12 STATEMENT:  START_REPLICATION SLOT "postgresql_2" 0/2C000000 TIMELINE 66
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=12 LOG:  received immediate shutdown request
2024-08-07 06:58:56 UTC [17577]: user=,db=,app=,client=,line=13 LOG:  database system is shut down
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=3 LOG:  starting PostgreSQL 14.12 (Ubuntu 14.12-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=4 LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=5 LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=1 LOG:  database system was interrupted; last known up at 2024-08-07 06:58:46 UTC
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=2 WARNING:  specified neither primary_conninfo nor restore_command
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=3 HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=4 LOG:  entering standby mode
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=5 LOG:  database system was not properly shut down; automatic recovery in progress
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=6 LOG:  redo starts at 0/2C0000D0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=7 LOG:  invalid record length at 0/2C000218: wanted 24, got 0
2024-08-07 06:58:59 UTC [17775]: user=,db=,app=,client=,line=8 LOG:  consistent recovery state reached at 0/2C000218
2024-08-07 06:58:59 UTC [17771]: user=,db=,app=,client=,line=6 LOG:  database system is ready to accept read-only connections
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=1 LOG:  received replication command: IDENTIFY_SYSTEM
2024-08-07 06:58:59 UTC [17784]: user=replication,db=[unknown],app=[unknown],client=127.0.0.1,line=2 STATEMENT:  IDENTIFY_SYSTEM
2024-08-07 06:59:00 UTC [17771]: user=,db=,app=,client=,line=7 LOG:  received SIGHUP, reloading configuration files

patroni log

root@postgresql-1:/# cat /var/log/postgresql/patroni.log.110
2024-08-07 07:02:09 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:19 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:29 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
2024-08-07 07:02:39 UTC [16]: INFO: no action. I am (postgresql-1), a secondary, and following a leader (postgresql-0) 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.120
2024-08-07 06:58:58 UTC [16]: WARNING: Retry got exception: connection problems 
2024-08-07 06:58:58 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
2024-08-07 06:58:58 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:58 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
2024-08-07 06:58:58 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:58 UTC [16]: WARNING: Retry got exception: connection problems 
2024-08-07 06:58:59 UTC [16]: INFO: Lock owner: postgresql-0; I am postgresql-1 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.121
2024-08-07 06:58:56 UTC [16]: ERROR: failed to update leader lock 
2024-08-07 06:58:56 UTC [16]: INFO: Demoting self (immediate-nolock) 
2024-08-07 06:58:56 UTC [16]: INFO: demoted self because failed to update leader lock in DCS 
2024-08-07 06:58:56 UTC [16]: INFO: Lock owner: None; I am postgresql-1 
2024-08-07 06:58:56 UTC [16]: INFO: not healthy enough for leader race 
2024-08-07 06:58:56 UTC [16]: INFO: starting after demotion in progress 
2024-08-07 06:58:56 UTC [16]: INFO: closed patroni connection to the postgresql cluster 
2024-08-07 06:58:57 UTC [16]: INFO: postmaster pid=17771 
root@postgresql-1:/# cat /var/log/postgresql/patroni.log.122
2024-08-07 06:58:56 UTC [16]: ERROR: Unexpected error from Kubernetes API 
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1159, in _update_leader_with_retry
    return bool(self._patch_or_create(self.leader_path, annotations, resource_version, ips=ips, retry=_retry))
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1090, in _patch_or_create
    ret = retry(func, self._namespace, body) if retry else func(self._namespace, body)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 1156, in _retry
    return retry(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/utils.py", line 599, in __call__
    return func(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 558, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 479, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 447, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 266, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'da0b6ecb-8d42-4964-9819-41a2d385b2dd', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': '37606d0c-8ddf-4cc4-9faa-563142648f97', 'X-Kubernetes-Pf-Prioritylevel-Uid': '296b2e7b-e274-4aed-9e4a-019ddda1c429', 'Date': 'Wed, 07 Aug 2024 06:58:56 GMT', 'Content-Length': '214'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \\"patroni-postgresql\\" not found","reason":"NotFound","details":{"name":"patroni-postgresql","kind":"endpoints"},"code":404}\n'

root@postgresql-1:/# cat /var/log/postgresql/patroni.log.123
2024-08-07 06:58:56 UTC [16]: INFO: Assigning synchronous standby status to ['postgresql-2'] 
2024-08-07 06:58:56 UTC [16]: INFO: Synchronous standby status assigned to ['postgresql-2'] 
2024-08-07 06:58:56 UTC [16]: INFO: acquired session lock as a leader 
2024-08-07 06:58:56 UTC [16]: INFO: Lock owner: postgresql-1; I am postgresql-1 

Additional context

github-actions[bot] commented 1 month ago

https://warthogs.atlassian.net/browse/DPE-5069

dragomirp commented 1 month ago

Looks like Patroni was unable to access the K8s API to update the leader lock. We can potentially enable Patroni's DCS failsafe mode to try to mitigate this.