EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)
https://repmgr.org/
Other
1.58k stars 252 forks source link

Repmgr mark primary as unreachable but doesn't trigger failover #722

Open nadav213000 opened 3 years ago

nadav213000 commented 3 years ago

Hey,

I have 3 cluster nodes deployed on VMs. There are some networks issues, which cause the primary to be not available to other nodes in cluster.

When we run cluster show command both other servers show primary as unreachable. But repmgr doesn't trigger failover. Further more, repmgr show no logs of monitoring the primary (also upstream node).

Repmgr work well in other situations, such as postgres service crush or server crush.

We have configured repmgr like that:

failover=automatic
reconnection_attempts=4
reconnect_interval=5

This seems to me like the relvant configuration for this problem.

Do you have any idea why repmgr doesnt trigger failover in situation like that? And doesnt write any logs either?

adihakimi commented 3 years ago

Experiencing the same issue, help will be appreciated!

bonesmoses commented 3 years ago

Can you confirm that the repmgrd daemon is running on all nodes? The logs from at least one node should clearly show if/when disconnections occurred, if it actually disrupted the Postgres connections repmgrd makes to each node.

What does this command show:

repmgr service status

If it is running on all nodes, how are you checking the logs? The repmgr daemon is very chatty even under normal operating circumstances.

Vlad1mir-D commented 2 years ago

@bonesmoses same here, with data corruption (different states on different servers). I hope this information will help.

repmgr diagnostic output from all three nodes ``` # kk exec -it project***-prod-store-postgresql-ha-postgresql-0 -c postgresql -- bash -i I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status postgresql-repmgr 05:35:06.75 postgresql-repmgr 05:35:06.75 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:35:06.75 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:35:06.75 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:35:06.76 WARNING: node "project***-prod-store-postgresql-ha-postgresql-1" not found in "pg_stat_replication" Node "project***-prod-store-postgresql-ha-postgresql-0": PostgreSQL version: 11.13 Total data size: 113 MB Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 Role: primary WAL archiving: enabled Archive command: /bin/true WALs pending archiving: 0 pending files Replication connections: 1 (of maximal 16) Replication slots: 2 physical (of maximal 10; 0 missing); 1 inactive Replication lag: n/a WARNING: following issue(s) were detected: - 1 of 2 downstream nodes not attached: - project***-prod-store-postgresql-ha-postgresql-1 (ID: 1001) - node has 1 inactive physical replication slots - repmgr_slot_1001 HINT: execute "repmgr node check" for more details I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check postgresql-repmgr 05:35:09.38 postgresql-repmgr 05:35:09.38 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:35:09.39 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:35:09.39 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:35:09.39 WARNING: node "project***-prod-store-postgresql-ha-postgresql-1" not found in "pg_stat_replication" Node "project***-prod-store-postgresql-ha-postgresql-0": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (N/A - node is primary) Downstream servers: CRITICAL (1 of 2 downstream nodes not attached; missing: project***-prod-store-postgresql-ha-postgresql-1 (ID: 1001)) Replication slots: CRITICAL (1 of 2 physical replication slots are inactive) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data") I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show postgresql-repmgr 05:35:15.13 postgresql-repmgr 05:35:15.13 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:35:15.14 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:35:15.14 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:35:15.14 ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ------+--------------------------------------------------+---------+----------------------+--------------------------------------------------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running | | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary | | default | 100 | 34 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event postgresql-repmgr 05:35:49.42 postgresql-repmgr 05:35:49.43 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:35:49.43 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:35:49.44 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:35:49.44 Node ID | Name | Event | OK | Timestamp | Details ---------+--------------------------------------------------+--------------------------+----+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:50:45 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_start | t | 2021-12-13 09:50:27 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_shutdown | t | 2021-12-13 09:50:16 | TERM signal received 1001 | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start | t | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow | t | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_register | t | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used) 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone | t | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow | t | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload | t | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed 1000 | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote | t | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary 1001 | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect | t | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start | t | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_register | t | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister | t | 2021-12-13 09:48:19 | I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status postgresql-repmgr 05:36:19.96 postgresql-repmgr 05:36:19.96 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:36:19.96 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:36:19.96 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:36:19.97 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+--------------------------------------------------+---------+----------------------+--------------------------------------------------+---------+-----+---------+-------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running | | running | 1 | no | n/a 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary | | running | 1 | no | n/a 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | running | 1 | no | 1 second(s) ago WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary I have no name!@project***-prod-store-postgresql-ha-postgresql-0:/$ exit # kk exec -it project***-prod-store-postgresql-ha-postgresql-1 -c postgresql -- bash -i I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status postgresql-repmgr 05:36:47.54 postgresql-repmgr 05:36:47.55 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:36:47.55 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:36:47.55 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:36:47.55 Node "project***-prod-store-postgresql-ha-postgresql-1": PostgreSQL version: 11.13 Total data size: 113 MB Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 Role: primary WAL archiving: enabled Archive command: /bin/true WALs pending archiving: 0 pending files Replication connections: 0 (of maximal 16) Replication slots: 0 physical (of maximal 10; 0 missing) Replication lag: n/a I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check postgresql-repmgr 05:36:50.03 postgresql-repmgr 05:36:50.04 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:36:50.06 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:36:50.07 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:36:50.11 Node "project***-prod-store-postgresql-ha-postgresql-1": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (N/A - node is primary) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data") I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show postgresql-repmgr 05:36:53.63 postgresql-repmgr 05:36:53.64 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:36:53.65 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:36:53.68 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:36:53.68 ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ------+--------------------------------------------------+---------+-----------+--------------------------------------------------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | ! running | | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1001 | project***-prod-store-postgresql-ha-postgresql-1 | primary | * running | | default | 100 | 34 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is running but the repmgr node record is inactive I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event postgresql-repmgr 05:36:55.57 postgresql-repmgr 05:36:55.58 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:36:55.59 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:36:55.61 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:36:55.63 Node ID | Name | Event | OK | Timestamp | Details ---------+--------------------------------------------------+--------------------------+----+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1001 | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_reload | t | 2021-12-13 09:50:15 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1001 | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_failover_promote | t | 2021-12-13 09:50:15 | node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) marked as failed 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_promote | t | 2021-12-13 09:50:14 | server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) was successfully promoted to primary 1001 | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start | t | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow | t | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_register | t | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used) 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone | t | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow | t | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload | t | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed 1000 | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote | t | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary 1001 | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect | t | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start | t | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_register | t | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister | t | 2021-12-13 09:48:19 | I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ I have no name!@project***-prod-store-postgresql-ha-postgresql-1:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status postgresql-repmgr 05:37:04.17 postgresql-repmgr 05:37:04.17 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:04.18 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:04.19 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:04.19 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+--------------------------------------------------+---------+-----------+--------------------------------------------------+---------+-----+---------+-------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | ! running | | running | 1 | no | n/a 1001 | project***-prod-store-postgresql-ha-postgresql-1 | primary | * running | | running | 1 | no | n/a 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | running | 1 | no | 1 second(s) ago WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is running but the repmgr node record is inactive # kk exec -it project***-prod-store-postgresql-ha-postgresql-2 -c postgresql -- bash -i I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node status postgresql-repmgr 05:37:21.87 postgresql-repmgr 05:37:21.87 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:21.88 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:21.88 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:21.88 Node "project***-prod-store-postgresql-ha-postgresql-2": PostgreSQL version: 11.13 Total data size: 113 MB Conninfo: user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 Role: standby WAL archiving: disabled (on standbys "archive_mode" must be set to "always" to be effective) Archive command: /bin/true WALs pending archiving: 0 pending files Replication connections: 0 (of maximal 16) Replication slots: 0 physical (of maximal 10; 0 missing) Upstream node: project***-prod-store-postgresql-ha-postgresql-0 (ID: 1000) Replication lag: 0 seconds Last received LSN: 1/326922A8 Last replayed LSN: 1/326922A8 I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf node check postgresql-repmgr 05:37:25.23 postgresql-repmgr 05:37:25.23 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:25.24 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:25.24 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:25.24 Node "project***-prod-store-postgresql-ha-postgresql-2": Server role: OK (node is standby) Replication lag: OK (0 seconds) WAL archiving: OK (0 pending archive ready files) Upstream connection: OK (node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) is attached to expected upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000)) Downstream servers: OK (this node has no downstream nodes) Replication slots: OK (node has no physical replication slots) Missing physical replication slots: OK (node has no missing physical replication slots) Configured data directory: OK (configured "data_directory" is "/bitnami/postgresql/data") I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show postgresql-repmgr 05:37:31.11 postgresql-repmgr 05:37:31.11 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:31.12 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:31.12 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:31.12 ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ------+--------------------------------------------------+---------+----------------------+--------------------------------------------------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running | | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary | | default | 100 | 34 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | default | 100 | 33 | user=repmgr password=password*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5 WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster event postgresql-repmgr 05:37:34.82 postgresql-repmgr 05:37:34.82 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:34.82 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:34.83 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:34.83 Node ID | Name | Event | OK | Timestamp | Details ---------+--------------------------------------------------+--------------------------+----+---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:50:45 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_start | t | 2021-12-13 09:50:27 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_shutdown | t | 2021-12-13 09:50:16 | TERM signal received 1001 | project***-prod-store-postgresql-ha-postgresql-1 | repmgrd_start | t | 2021-12-13 09:49:42 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:41 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_follow | t | 2021-12-13 09:49:39 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:49:35 | new standby "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) has connected 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_register | t | 2021-12-13 09:49:32 | standby registration succeeded; upstream node ID is 1000 (-F/--force option was used) 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby_clone | t | 2021-12-13 09:49:25 | cloned from host "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local", port 5432; backup method: pg_basebackup; --force: Y 1000 | project***-prod-store-postgresql-ha-postgresql-0 | child_node_new_connect | t | 2021-12-13 09:48:53 | new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_failover_follow | t | 2021-12-13 09:48:48 | node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:48 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_reload | t | 2021-12-13 09:48:47 | monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) 1000 | project***-prod-store-postgresql-ha-postgresql-0 | repmgrd_failover_promote | t | 2021-12-13 09:48:47 | node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) promoted to primary; old primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) marked as failed 1000 | project***-prod-store-postgresql-ha-postgresql-0 | standby_promote | t | 2021-12-13 09:48:47 | server "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) was successfully promoted to primary 1001 | project***-prod-store-postgresql-ha-postgresql-1 | child_node_reconnect | t | 2021-12-13 09:48:23 | standby node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has reconnected after 73 seconds 1002 | project***-prod-store-postgresql-ha-postgresql-2 | repmgrd_start | t | 2021-12-13 09:48:21 | monitoring connection to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_follow | t | 2021-12-13 09:48:20 | standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_register | t | 2021-12-13 09:48:19 | standby registration succeeded; upstream node ID is 1001 (-F/--force option was used) 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby_unregister | t | 2021-12-13 09:48:19 | I have no name!@project***-prod-store-postgresql-ha-postgresql-2:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf service status postgresql-repmgr 05:37:39.23 postgresql-repmgr 05:37:39.23 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 05:37:39.24 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 05:37:39.24 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 05:37:39.24 ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+--------------------------------------------------+---------+----------------------+--------------------------------------------------+---------+-----+---------+-------------------- 1000 | project***-prod-store-postgresql-ha-postgresql-0 | primary | * running | | running | 1 | no | n/a 1001 | project***-prod-store-postgresql-ha-postgresql-1 | standby | ! running as primary | | running | 1 | no | n/a 1002 | project***-prod-store-postgresql-ha-postgresql-2 | standby | running | project***-prod-store-postgresql-ha-postgresql-0 | running | 1 | no | 1 second(s) ago WARNING: following issues were detected - node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) is registered as standby but running as primary ```
repmgr.conf from -2 node ``` event_notification_command='/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"' ssh_options='-o "StrictHostKeyChecking no" -v' use_replication_slots='1' pg_bindir='/opt/bitnami/postgresql/bin' # FIXME: these 2 parameter should work node_id=1002 node_name='project***-prod-store-postgresql-ha-postgresql-2' location='default' conninfo='user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5' failover='automatic' promote_command='PGPASSWORD=anotherpassword*** repmgr standby promote -f "/opt/bitnami/repmgr/conf/repmgr.conf" --log-level DEBUG --verbose' follow_command='PGPASSWORD=anotherpassword*** repmgr standby follow -f "/opt/bitnami/repmgr/conf/repmgr.conf" -W --log-level DEBUG --verbose' reconnect_attempts='3' reconnect_interval='5' log_level='NOTICE' priority='100' degraded_monitoring_timeout='5' data_directory='/bitnami/postgresql/data' async_query_timeout='20' pg_ctl_options='-o "--config-file=\"/opt/bitnami/postgresql/conf/postgresql.conf\" --external_pid_file=\"/opt/bitnami/postgresql/tmp/postgresql.pid\" --hba_file=\"/opt/bitnami/postgresql/conf/pg_hba.conf\""' ```
Logs from -0 node ``` postgresql-repmgr 09:50:23.11 postgresql-repmgr 09:50:23.13 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 09:50:23.14 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 09:50:23.15 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 09:50:23.16 postgresql-repmgr 09:50:23.29 INFO ==> ** Starting PostgreSQL with Replication Manager setup ** postgresql-repmgr 09:50:23.39 INFO ==> Validating settings in REPMGR_* env vars... postgresql-repmgr 09:50:23.41 INFO ==> Validating settings in POSTGRESQL_* env vars.. postgresql-repmgr 09:50:23.42 INFO ==> Querying all partner nodes for common upstream node... postgresql-repmgr 09:50:25.52 WARN ==> Conflict of pretending primary role nodes (previously: 'project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432', now: 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432') postgresql-repmgr 09:50:25.52 INFO ==> This node was acting as a primary before restart! postgresql-repmgr 09:50:25.52 INFO ==> Can not find new primary. Starting PostgreSQL normally... postgresql-repmgr 09:50:25.53 INFO ==> There are no nodes with primary role. Assuming the primary role... postgresql-repmgr 09:50:25.54 INFO ==> Preparing PostgreSQL configuration... postgresql-repmgr 09:50:25.56 INFO ==> postgresql.conf file not detected. Generating it... postgresql-repmgr 09:50:25.75 INFO ==> Preparing repmgr configuration... postgresql-repmgr 09:50:25.77 INFO ==> Initializing Repmgr... postgresql-repmgr 09:50:25.78 INFO ==> Initializing PostgreSQL database... postgresql-repmgr 09:50:25.78 INFO ==> Cleaning stale /bitnami/postgresql/data/postmaster.pid file postgresql-repmgr 09:50:25.79 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected postgresql-repmgr 09:50:25.80 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected postgresql-repmgr 09:50:25.84 INFO ==> Deploying PostgreSQL with persisted data... postgresql-repmgr 09:50:25.88 INFO ==> Configuring replication parameters postgresql-repmgr 09:50:25.93 INFO ==> Configuring fsync postgresql-repmgr 09:50:25.95 INFO ==> ** PostgreSQL with Replication Manager setup finished! ** postgresql-repmgr 09:50:26.02 INFO ==> Starting PostgreSQL in background... waiting for server to start....2021-12-13 09:50:26.626 GMT [181] LOG: pgaudit extension initialized 2021-12-13 09:50:26.627 GMT [181] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-12-13 09:50:26.627 GMT [181] LOG: listening on IPv6 address "::", port 5432 2021-12-13 09:50:26.637 GMT [181] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-12-13 09:50:26.664 GMT [181] LOG: redirecting log output to logging collector process 2021-12-13 09:50:26.664 GMT [181] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs". 2021-12-13 09:50:26.671 GMT [183] LOG: database system was interrupted; last known up at 2021-12-13 09:49:18 GMT 2021-12-13 09:50:26.844 GMT [183] LOG: database system was not properly shut down; automatic recovery in progress 2021-12-13 09:50:26.852 GMT [183] LOG: redo starts at 1/30000028 2021-12-13 09:50:26.882 GMT [183] LOG: invalid record length at 1/31003D98: wanted 24, got 0 2021-12-13 09:50:26.882 GMT [183] LOG: redo done at 1/31003D70 2021-12-13 09:50:26.882 GMT [183] LOG: last completed transaction was at log time 2021-12-13 09:50:16.474731+00 2021-12-13 09:50:26.920 GMT [181] LOG: database system is ready to accept connections done server started postgresql-repmgr 09:50:27.05 INFO ==> ** Starting repmgrd ** [2021-12-13 09:50:27] [NOTICE] repmgrd (repmgrd 5.2.1) starting up INFO: set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid [2021-12-13 09:50:27] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) [2021-12-13 09:50:27] [NOTICE] monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) [2021-12-13 09:50:45] [NOTICE] new standby "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) has connected 2021-12-13 11:12:10.691 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:10.691 GMT [2773] DETAIL: Key (user_hash)=(274c9456877e6474) already exists. 2021-12-13 11:12:10.691 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:10.670693+00:00'::timestamptz,'2021-12-13T11:12:10.670711+00:00'::timestamptz,'2021-12-13'::date,'274c9456877e6474','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:16.490 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:16.490 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:16.490 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:16.489950+00:00'::timestamptz,'2021-12-13T11:12:16.489970+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:18.100 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:18.100 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:18.100 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:18.099218+00:00'::timestamptz,'2021-12-13T11:12:18.099242+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:18.577 GMT [2771] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:18.577 GMT [2771] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:18.577 GMT [2771] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:18.576057+00:00'::timestamptz,'2021-12-13T11:12:18.576074+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:19.523 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:19.523 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:19.523 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:19.495380+00:00'::timestamptz,'2021-12-13T11:12:19.495408+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:23.589 GMT [2771] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:23.589 GMT [2771] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:23.589 GMT [2771] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:23.582592+00:00'::timestamptz,'2021-12-13T11:12:23.582618+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:27.436 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:27.436 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:27.436 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:27.434530+00:00'::timestamptz,'2021-12-13T11:12:27.434556+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:28.423 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:28.423 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:28.423 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:28.420979+00:00'::timestamptz,'2021-12-13T11:12:28.420998+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:30.426 GMT [2771] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:30.426 GMT [2771] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:30.426 GMT [2771] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:30.424756+00:00'::timestamptz,'2021-12-13T11:12:30.424782+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 11:12:31.427 GMT [2773] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-13 11:12:31.427 GMT [2773] DETAIL: Key (user_hash)=(f7fb2062aaacf611) already exists. 2021-12-13 11:12:31.427 GMT [2773] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-13T11:12:31.425182+00:00'::timestamptz,'2021-12-13T11:12:31.425208+00:00'::timestamptz,'2021-12-13'::date,'f7fb2062aaacf611','***','facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-13 15:04:26.569 GMT [2429] ERROR: insert or update on table "apps" violates foreign key constraint "apps_icon_id_fkey" 2021-12-13 15:04:26.569 GMT [2429] DETAIL: Key (icon_id)=(178) is not present in table "statics". 2021-12-13 15:04:26.569 GMT [2429] STATEMENT: INSERT INTO "apps" ("created","updated","created_date","slug","store","title","link_by_click_install","icon_id") VALUES ('2021-12-13T15:04:26.321023+00:00'::timestamptz,'2021-12-13T15:04:26.321040+00:00'::timestamptz,'2021-12-13'::date,'test','App Store','test',NULL,178) RETURNING "id" 2021-12-15 03:46:31.041 GMT [415065] ERROR: duplicate key value violates unique constraint "visitors_user_hash_key" 2021-12-15 03:46:31.041 GMT [415065] DETAIL: Key (user_hash)=(47acabe37a05742a) already exists. 2021-12-15 03:46:31.041 GMT [415065] STATEMENT: INSERT INTO "visitors" ("created","updated","created_date","user_hash","ip_address","user_agent","accept_language","accept_encoding") VALUES ('2021-12-15T03:46:30.615370+00:00'::timestamptz,'2021-12-15T03:46:30.615432+00:00'::timestamptz,'2021-12-14'::date,'47acabe37a05742a','35.191.10.180','Mozilla/5.0 (iPhone; CPU iPhone OS 15_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148 Instagram 216.0.0.12.135 (iPhone11,6; iOS 15_1; en_US; en-US; scale=3.00; 1242x2688; 338132253)','en-US,en;q=0.9','gzip, deflate, br') RETURNING "id" 2021-12-20 04:07:06.291 GMT [1881286] FATAL: password authentication failed for user "admin" 2021-12-20 04:07:06.291 GMT [1881286] DETAIL: Role "admin" does not exist. Connection matched pg_hba.conf line 8: "host all all 0.0.0.0/0 md5" 2021-12-20 04:07:07.412 GMT [1881296] FATAL: password authentication failed for user "admin" 2021-12-20 04:07:07.412 GMT [1881296] DETAIL: Role "admin" does not exist. Connection matched pg_hba.conf line 8: "host all all 0.0.0.0/0 md5" 2021-12-20 04:11:19.428 GMT [1882075] LOG: unexpected EOF on client connection with an open transaction 2021-12-20 04:13:01.306 GMT [1882469] FATAL: password authentication failed for user "postgres" 2021-12-20 04:13:01.306 GMT [1882469] DETAIL: Password does not match for user "postgres". Connection matched pg_hba.conf line 8: "host all all 0.0.0.0/0 md5" 2021-12-20 04:19:27.311 GMT [1883873] LOG: could not send data to client: Connection reset by peer 2021-12-20 04:19:27.311 GMT [1883873] STATEMENT: COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout; 2021-12-20 04:19:27.311 GMT [1883873] FATAL: connection to client lost 2021-12-20 04:19:27.311 GMT [1883873] STATEMENT: COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout; 2021-12-20 04:19:55.702 GMT [1883987] ERROR: canceling statement due to user request 2021-12-20 04:19:55.702 GMT [1883987] STATEMENT: COPY public."RawStatistic" (id, created, updated, created_date, action, device, os, os_version, browser, country, language, scroll, source, url, campaign_name, campaign_source, campaign_medium, campaign_term, campaign_content, session_id, visitor_id, screenshot_id, app_variant_id) TO stdout; 2021-12-20 04:19:55.704 GMT [1883987] LOG: could not receive data from client: Connection reset by peer 2021-12-20 05:09:08.244 GMT [1893736] LOG: invalid length of startup packet ```
Logs from -1 node ``` postgresql-repmgr 09:49:12.69 postgresql-repmgr 09:49:12.77 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 09:49:12.78 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 09:49:12.78 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 09:49:12.78 postgresql-repmgr 09:49:14.52 INFO ==> ** Starting PostgreSQL with Replication Manager setup ** postgresql-repmgr 09:49:15.25 INFO ==> Validating settings in REPMGR_* env vars... postgresql-repmgr 09:49:15.26 INFO ==> Validating settings in POSTGRESQL_* env vars.. postgresql-repmgr 09:49:15.27 INFO ==> Querying all partner nodes for common upstream node... postgresql-repmgr 09:49:16.35 INFO ==> Auto-detected primary node: 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432' postgresql-repmgr 09:49:16.42 INFO ==> This node was acting as a primary before restart! postgresql-repmgr 09:49:16.43 INFO ==> Current master is 'project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432'. Cloning/rewinding it and acting as a standby node... postgresql-repmgr 09:49:16.65 INFO ==> Preparing PostgreSQL configuration... postgresql-repmgr 09:49:16.78 INFO ==> postgresql.conf file not detected. Generating it... postgresql-repmgr 09:49:17.50 INFO ==> Preparing repmgr configuration... postgresql-repmgr 09:49:17.55 INFO ==> Initializing Repmgr... postgresql-repmgr 09:49:17.59 INFO ==> Waiting for primary node... postgresql-repmgr 09:49:17.64 INFO ==> Cloning data from primary node... postgresql-repmgr 09:49:27.52 INFO ==> Initializing PostgreSQL database... postgresql-repmgr 09:49:27.76 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected postgresql-repmgr 09:49:27.76 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected postgresql-repmgr 09:49:28.08 INFO ==> Deploying PostgreSQL with persisted data... postgresql-repmgr 09:49:28.20 INFO ==> Configuring replication parameters postgresql-repmgr 09:49:28.37 INFO ==> Configuring fsync postgresql-repmgr 09:49:28.44 INFO ==> Setting up streaming replication slave... postgresql-repmgr 09:49:28.63 INFO ==> Starting PostgreSQL in background... postgresql-repmgr 09:49:31.88 INFO ==> Unregistering standby node... postgresql-repmgr 09:49:32.25 INFO ==> Registering Standby node... postgresql-repmgr 09:49:32.46 INFO ==> Check if primary running... postgresql-repmgr 09:49:32.50 INFO ==> Waiting for primary node... postgresql-repmgr 09:49:32.99 INFO ==> Running standby follow... postgresql-repmgr 09:49:39.51 INFO ==> Stopping PostgreSQL... waiting for server to shut down.... done server stopped postgresql-repmgr 09:49:39.81 INFO ==> ** PostgreSQL with Replication Manager setup finished! ** postgresql-repmgr 09:49:39.92 INFO ==> Starting PostgreSQL in background... waiting for server to start....2021-12-13 09:49:40.209 GMT [288] LOG: pgaudit extension initialized 2021-12-13 09:49:40.210 GMT [288] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-12-13 09:49:40.210 GMT [288] LOG: listening on IPv6 address "::", port 5432 2021-12-13 09:49:40.316 GMT [288] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-12-13 09:49:40.451 GMT [288] LOG: redirecting log output to logging collector process 2021-12-13 09:49:40.451 GMT [288] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs". 2021-12-13 09:49:40.523 GMT [290] LOG: database system was shut down in recovery at 2021-12-13 09:49:39 GMT 2021-12-13 09:49:40.523 GMT [290] LOG: entering standby mode 2021-12-13 09:49:40.574 GMT [290] LOG: redo starts at 1/30000028 2021-12-13 09:49:40.578 GMT [290] LOG: consistent recovery state reached at 1/31003B00 2021-12-13 09:49:40.578 GMT [290] LOG: invalid record length at 1/31003B00: wanted 24, got 0 2021-12-13 09:49:40.578 GMT [288] LOG: database system is ready to accept read only connections done server started 2021-12-13 09:49:40.630 GMT [295] LOG: started streaming WAL from primary at 1/31000000 on timeline 33 postgresql-repmgr 09:49:40.63 INFO ==> ** Starting repmgrd ** [2021-12-13 09:49:40] [NOTICE] repmgrd (repmgrd 5.2.1) starting up INFO: set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid [2021-12-13 09:49:42] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) [2021-12-13 09:49:51] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5" [2021-12-13 09:49:51] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:49:51] [WARNING] unable to connect to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) [2021-12-13 09:49:56] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:49:56] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:50:01] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:50:01] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:50:06] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:50:06] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:50:06] [WARNING] unable to reconnect to node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) after 3 attempts [2021-12-13 09:50:06] [NOTICE] promotion candidate is "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) [2021-12-13 09:50:06] [NOTICE] this node is the winner, will now promote itself and inform other nodes NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf" DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' INFO: connected to standby, checking its state DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1001 DEBUG: get_replication_info(): SELECT ts, in_recovery, last_wal_receive_lsn, last_wal_replay_lsn, last_xact_replay_timestamp, CASE WHEN (last_wal_receive_lsn = last_wal_replay_lsn) THEN 0::INT ELSE CASE WHEN last_xact_replay_timestamp IS NULL THEN 0::INT ELSE EXTRACT(epoch FROM (pg_catalog.clock_timestamp() - last_xact_replay_timestamp))::INT END END AS replication_lag_time, last_wal_receive_lsn >= last_wal_replay_lsn AS receiving_streamed_wal, wal_replay_paused, upstream_last_seen, upstream_node_id FROM ( SELECT CURRENT_TIMESTAMP AS ts, pg_catalog.pg_is_in_recovery() AS in_recovery, pg_catalog.pg_last_xact_replay_timestamp() AS last_xact_replay_timestamp, COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'::PG_LSN) AS last_wal_receive_lsn, COALESCE(pg_catalog.pg_last_wal_replay_lsn(), '0/0'::PG_LSN) AS last_wal_replay_lsn, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN FALSE ELSE pg_catalog.pg_is_wal_replay_paused() END AS wal_replay_paused, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN -1 ELSE repmgr.get_upstream_last_seen() END AS upstream_last_seen, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN -1 ELSE repmgr.get_upstream_node_id() END AS upstream_node_id ) q INFO: searching for primary node DEBUG: get_primary_connection(): SELECT node_id, conninfo, CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nodes WHERE active IS TRUE AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id INFO: checking if node 1000 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" ERROR: connection to database failed DETAIL: could not translate host name "project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known DETAIL: attempted to connect using: user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path= INFO: checking if node 1001 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: checking if node 1002 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: get_node_replication_stats(): SELECT pg_catalog.current_setting('max_wal_senders')::INT AS max_wal_senders, (SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication) AS attached_wal_receivers, current_setting('max_replication_slots')::INT AS max_replication_slots, (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE slot_type='physical') AS total_replication_slots, (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS TRUE AND slot_type='physical') AS active_replication_slots, (SELECT pg_catalog.count(*) FROM pg_catalog.pg_replication_slots WHERE active IS FALSE AND slot_type='physical') AS inactive_replication_slots, pg_catalog.pg_is_in_recovery() AS in_recovery DEBUG: get_active_sibling_node_records(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.upstream_node_id = 1000 AND n.node_id != 1001 AND n.active IS TRUE ORDER BY n.node_id DEBUG: clear_node_info_list() - closing open connections DEBUG: clear_node_info_list() - unlinking WARNING: 1 sibling nodes found, but option "--siblings-follow" not specified DETAIL: these nodes will remain attached to the current primary: project***-prod-store-postgresql-ha-postgresql-2 (node ID: 1002) DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1001 NOTICE: promoting standby to primary DETAIL: promoting server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' promote" 2021-12-13 09:50:07.879 GMT [290] LOG: received promote request 2021-12-13 09:50:07.880 GMT [295] FATAL: terminating walreceiver process due to administrator command 2021-12-13 09:50:07.881 GMT [290] LOG: invalid record length at 1/31003CC8: wanted 24, got 0 2021-12-13 09:50:07.881 GMT [290] LOG: redo done at 1/31003CA0 2021-12-13 09:50:07.881 GMT [290] LOG: last completed transaction was at log time 2021-12-13 09:49:42.657704+00 2021-12-13 09:50:11.833 GMT [290] LOG: selected new timeline ID: 34 2021-12-13 09:50:13.598 GMT [290] LOG: archive recovery complete NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: standby promoted to primary after 0 second(s) DEBUG: setting node 1001 as primary and marking existing primary as failed DEBUG: begin_transaction() 2021-12-13 09:50:14.606 GMT [288] LOG: database system is ready to accept connections DEBUG: commit_transaction() NOTICE: STANDBY PROMOTE successful DETAIL: server "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) was successfully promoted to primary DEBUG: _create_event(): event is "standby_promote" for node 1001 DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: _create_event(): INSERT INTO repmgr.events ( node_id, event, successful, details ) VALUES ($1, $2, $3, $4) RETURNING event_timestamp DEBUG: _create_event(): Event timestamp is "2021-12-13 09:50:14.76431+00" DEBUG: _create_event(): command is '/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"' INFO: executing notification command for event "standby_promote" DETAIL: command is: /opt/bitnami/repmgr/events/router.sh 1001 standby_promote 1 "2021-12-13 09:50:14.76431+00" "server \"project***-prod-store-postgresql-ha-postgresql-1\" (ID: 1001) was successfully promoted to primary" DEBUG: clear_node_info_list() - closing open connections DEBUG: clear_node_info_list() - unlinking [2021-12-13 09:50:15] [NOTICE] node 1001 has recovered, reconnecting [2021-12-13 09:50:15] [NOTICE] notifying node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) to follow node 1001 INFO: node 1002 received notification to follow node 1001 [2021-12-13 09:50:15] [NOTICE] monitoring cluster primary "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) 2021-12-13 11:12:06.332 GMT [2907] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_session_id_fkey" 2021-12-13 11:12:06.332 GMT [2907] DETAIL: Key (session_id)=(12855) is not present in table "sessions". 2021-12-13 11:12:06.332 GMT [2907] STATEMENT: INSERT INTO "RawStatistic" ("created","updated","created_date","action","device","os","os_version","browser","country","language","scroll","source","url","campaign_name","campaign_source","campaign_medium","campaign_term","campaign_content","app_variant_id","screenshot_id","session_id","visitor_id") VALUES ('2021-12-13T11:12:05.897118+00:00'::timestamptz,'2021-12-13T11:12:05.897142+00:00'::timestamptz,'2021-12-13'::date,8,'Unknown','Unknown','Unknown','Unknown','US','en-US',0.0,'direct','https://store.tld?args=***',NULL,NULL,NULL,NULL,NULL,75,NULL,12855,8234) RETURNING "id" 2021-12-13 11:12:06.731 GMT [2907] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_session_id_fkey" 2021-12-13 11:12:06.731 GMT [2907] DETAIL: Key (session_id)=(12855) is not present in table "sessions". 2021-12-13 11:12:06.731 GMT [2907] STATEMENT: INSERT INTO "RawStatistic" ("created","updated","created_date","action","device","os","os_version","browser","country","language","scroll","source","url","campaign_name","campaign_source","campaign_medium","campaign_term","campaign_content","app_variant_id","screenshot_id","session_id","visitor_id") VALUES ('2021-12-13T11:12:06.729298+00:00'::timestamptz,'2021-12-13T11:12:06.729325+00:00'::timestamptz,'2021-12-13'::date,2,'Unknown','Unknown','Unknown','Unknown','US','en-US',0.0,'direct','https://store.tld?args=***',NULL,NULL,NULL,NULL,NULL,75,NULL,12855,8234) RETURNING "id" 2021-12-13 16:29:58.138 GMT [64721] LOG: incomplete startup packet 2021-12-14 11:09:35.701 GMT [2608] LOG: could not receive data from client: Connection reset by peer 2021-12-14 11:58:58.867 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" 2021-12-14 11:58:58.867 GMT [249004] DETAIL: Key (visitor_id)=(8249) is not present in table "visitors". 2021-12-14 11:58:58.867 GMT [249004] STATEMENT: UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:58:58.861377+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311 2021-12-14 11:59:19.303 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" 2021-12-14 11:59:19.303 GMT [249004] DETAIL: Key (visitor_id)=(8249) is not present in table "visitors". 2021-12-14 11:59:19.303 GMT [249004] STATEMENT: UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:19.296233+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311 2021-12-14 11:59:27.267 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" 2021-12-14 11:59:27.267 GMT [249004] DETAIL: Key (visitor_id)=(8249) is not present in table "visitors". 2021-12-14 11:59:27.267 GMT [249004] STATEMENT: UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:27.248728+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311 2021-12-14 11:59:32.276 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" 2021-12-14 11:59:32.276 GMT [249004] DETAIL: Key (visitor_id)=(8249) is not present in table "visitors". 2021-12-14 11:59:32.276 GMT [249004] STATEMENT: UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:32.254412+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311 2021-12-14 11:59:37.217 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" 2021-12-14 11:59:37.217 GMT [249004] DETAIL: Key (visitor_id)=(8249) is not present in table "visitors". 2021-12-14 11:59:37.217 GMT [249004] STATEMENT: UPDATE "RawStatistic" SET "created"='2021-12-14T11:58:27.806078+00:00'::timestamptz,"updated"='2021-12-14T11:59:37.211488+00:00'::timestamptz,"created_date"='2021-12-13'::date,"action"=2,"device"='Sony G8341',"os"='Android',"os_version"='9',"browser"='Chrome 96.0.4664.92',"country"='US',"language"='es-ES',"scroll"=0.0,"source"='direct',"url"='https://store.tld?args=***',"campaign_name"=NULL,"campaign_source"=NULL,"campaign_medium"=NULL,"campaign_term"=NULL,"campaign_content"=NULL,"app_variant_id"=75,"screenshot_id"=NULL,"session_id"=12864,"visitor_id"=8249 WHERE "id"=118311 2021-12-14 11:59:44.253 GMT [249004] ERROR: insert or update on table "RawStatistic" violates foreign key constraint "RawStatistic_visitor_id_fkey" ```
Logs from -2 node ``` postgresql-repmgr 09:47:44.57 postgresql-repmgr 09:47:44.57 Welcome to the Bitnami postgresql-repmgr container postgresql-repmgr 09:47:44.57 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-postgresql-repmgr postgresql-repmgr 09:47:44.58 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-postgresql-repmgr/issues postgresql-repmgr 09:47:44.58 postgresql-repmgr 09:47:44.69 INFO ==> ** Starting PostgreSQL with Replication Manager setup ** postgresql-repmgr 09:47:44.73 INFO ==> Validating settings in REPMGR_* env vars... postgresql-repmgr 09:47:44.74 INFO ==> Validating settings in POSTGRESQL_* env vars.. postgresql-repmgr 09:47:44.74 INFO ==> Querying all partner nodes for common upstream node... postgresql-repmgr 09:47:44.91 INFO ==> Auto-detected primary node: 'project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local:5432' postgresql-repmgr 09:47:44.93 INFO ==> Preparing PostgreSQL configuration... postgresql-repmgr 09:47:44.95 INFO ==> postgresql.conf file not detected. Generating it... postgresql-repmgr 09:47:45.23 INFO ==> Preparing repmgr configuration... postgresql-repmgr 09:47:45.26 INFO ==> Initializing Repmgr... postgresql-repmgr 09:47:45.27 INFO ==> Waiting for primary node... postgresql-repmgr 09:47:45.33 INFO ==> Cloning data from primary node... postgresql-repmgr 09:48:17.84 INFO ==> Initializing PostgreSQL database... postgresql-repmgr 09:48:17.89 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected postgresql-repmgr 09:48:17.89 INFO ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected postgresql-repmgr 09:48:18.03 INFO ==> Deploying PostgreSQL with persisted data... postgresql-repmgr 09:48:18.07 INFO ==> Configuring replication parameters postgresql-repmgr 09:48:18.15 INFO ==> Configuring fsync postgresql-repmgr 09:48:18.18 INFO ==> Setting up streaming replication slave... postgresql-repmgr 09:48:18.27 INFO ==> Starting PostgreSQL in background... postgresql-repmgr 09:48:19.33 INFO ==> Unregistering standby node... postgresql-repmgr 09:48:19.52 INFO ==> Registering Standby node... postgresql-repmgr 09:48:19.73 INFO ==> Check if primary running... postgresql-repmgr 09:48:19.74 INFO ==> Waiting for primary node... postgresql-repmgr 09:48:19.78 INFO ==> Running standby follow... postgresql-repmgr 09:48:20.39 INFO ==> Stopping PostgreSQL... waiting for server to shut down.... done server stopped postgresql-repmgr 09:48:20.53 INFO ==> ** PostgreSQL with Replication Manager setup finished! ** postgresql-repmgr 09:48:20.61 INFO ==> Starting PostgreSQL in background... waiting for server to start....2021-12-13 09:48:20.666 GMT [291] LOG: pgaudit extension initialized 2021-12-13 09:48:20.667 GMT [291] LOG: listening on IPv4 address "0.0.0.0", port 5432 2021-12-13 09:48:20.667 GMT [291] LOG: listening on IPv6 address "::", port 5432 2021-12-13 09:48:20.678 GMT [291] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2021-12-13 09:48:20.694 GMT [291] LOG: redirecting log output to logging collector process 2021-12-13 09:48:20.694 GMT [291] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs". 2021-12-13 09:48:20.702 GMT [293] LOG: database system was shut down in recovery at 2021-12-13 09:48:20 GMT 2021-12-13 09:48:20.703 GMT [293] LOG: entering standby mode 2021-12-13 09:48:20.709 GMT [293] LOG: redo starts at 1/2E000028 2021-12-13 09:48:20.710 GMT [293] LOG: consistent recovery state reached at 1/2F002DE8 2021-12-13 09:48:20.710 GMT [293] LOG: invalid record length at 1/2F002DE8: wanted 24, got 0 2021-12-13 09:48:20.711 GMT [291] LOG: database system is ready to accept read only connections 2021-12-13 09:48:20.727 GMT [297] LOG: started streaming WAL from primary at 1/2F000000 on timeline 32 done server started postgresql-repmgr 09:48:20.76 INFO ==> ** Starting repmgrd ** [2021-12-13 09:48:20] [NOTICE] repmgrd (repmgrd 5.2.1) starting up INFO: set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid [2021-12-13 09:48:21] [NOTICE] starting monitoring of node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) [2021-12-13 09:48:32] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5" [2021-12-13 09:48:32] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:48:32] [WARNING] unable to connect to upstream node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) [2021-12-13 09:48:37] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:48:37] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:48:42] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:48:42] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:48:47] [WARNING] unable to ping "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr" [2021-12-13 09:48:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:48:47] [WARNING] unable to reconnect to node "project***-prod-store-postgresql-ha-postgresql-1" (ID: 1001) after 3 attempts [2021-12-13 09:48:47] [WARNING] node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) is not in recovery [2021-12-13 09:48:47] [ERROR] connection to database failed [2021-12-13 09:48:47] [DETAIL] fe_sendauth: no password supplied [2021-12-13 09:48:47] [ERROR] unable to establish a replication connection to the local node [2021-12-13 09:48:47] [WARNING] not possible to attach to node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000), ignoring [2021-12-13 09:48:47] [NOTICE] promotion candidate is "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) [2021-12-13 09:48:47] [NOTICE] this node is the winner, will now promote itself and inform other nodes NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf" DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' INFO: connected to standby, checking its state DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1002 DEBUG: get_replication_info(): SELECT ts, in_recovery, last_wal_receive_lsn, last_wal_replay_lsn, last_xact_replay_timestamp, CASE WHEN (last_wal_receive_lsn = last_wal_replay_lsn) THEN 0::INT ELSE CASE WHEN last_xact_replay_timestamp IS NULL THEN 0::INT ELSE EXTRACT(epoch FROM (pg_catalog.clock_timestamp() - last_xact_replay_timestamp))::INT END END AS replication_lag_time, last_wal_receive_lsn >= last_wal_replay_lsn AS receiving_streamed_wal, wal_replay_paused, upstream_last_seen, upstream_node_id FROM ( SELECT CURRENT_TIMESTAMP AS ts, pg_catalog.pg_is_in_recovery() AS in_recovery, pg_catalog.pg_last_xact_replay_timestamp() AS last_xact_replay_timestamp, COALESCE(pg_catalog.pg_last_wal_receive_lsn(), '0/0'::PG_LSN) AS last_wal_receive_lsn, COALESCE(pg_catalog.pg_last_wal_replay_lsn(), '0/0'::PG_LSN) AS last_wal_replay_lsn, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN FALSE ELSE pg_catalog.pg_is_wal_replay_paused() END AS wal_replay_paused, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN -1 ELSE repmgr.get_upstream_last_seen() END AS upstream_last_seen, CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE THEN -1 ELSE repmgr.get_upstream_node_id() END AS upstream_node_id ) q INFO: searching for primary node DEBUG: get_primary_connection(): SELECT node_id, conninfo, CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nodes WHERE active IS TRUE AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id INFO: checking if node 1001 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" ERROR: connection to database failed DETAIL: could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known DETAIL: attempted to connect using: user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path= INFO: checking if node 1000 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: current primary node is 1000 ERROR: this replication cluster already has an active primary server DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1000 DETAIL: current primary is "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) [2021-12-13 09:48:47] [ERROR] promote command failed [2021-12-13 09:48:47] [DETAIL] promote command exited with error code 8 [2021-12-13 09:48:47] [ERROR] connection to database failed [2021-12-13 09:48:47] [DETAIL] could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known [2021-12-13 09:48:47] [DETAIL] attempted to connect using: user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path= [2021-12-13 09:48:47] [WARNING] unable to ping "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5" [2021-12-13 09:48:47] [DETAIL] PQping() returned "PQPING_NO_RESPONSE" [2021-12-13 09:48:47] [NOTICE] attempting to follow new primary "project***-prod-store-postgresql-ha-postgresql-0" (node ID: 1000) NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf" WARNING: following problems with command line parameters detected: --no-wait will be ignored when executing STANDBY FOLLOW DEBUG: do_standby_follow() DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' INFO: connected to local node DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1002 NOTICE: attempting to find and follow current primary INFO: searching for primary node DEBUG: get_primary_connection(): SELECT node_id, conninfo, CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority FROM repmgr.nodes WHERE active IS TRUE AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id INFO: checking if node 1001 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" ERROR: connection to database failed DETAIL: could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known DETAIL: attempted to connect using: user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path= INFO: checking if node 1000 is primary DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: current primary node is 1000 INFO: connected to node 1000, checking for current primary DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1000 DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() INFO: follow target is primary node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) DEBUG: local timeline: 32; follow target timeline: 33 DEBUG: get_timeline_history(): TIMELINE_HISTORY 33 DEBUG: local tli: 32; local_xlogpos: 1/2F002FC8; follow_target_history->tli: 32; follow_target_history->end: 1/2F002FC8 INFO: local node 1002 can attach to follow target node 1000 DETAIL: local node's recovery point: 1/2F002FC8; follow target node's fork point: 1/2F002FC8 DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1002 INFO: creating replication slot as user "repmgr" DEBUG: get_slot_record(): SELECT slot_name, slot_type, active FROM pg_catalog.pg_replication_slots WHERE slot_name = 'repmgr_slot_1002' DEBUG: create_replication_slot_sql(): creating slot "repmgr_slot_1002" on upstream DEBUG: create_replication_slot_sql(): SELECT * FROM pg_catalog.pg_create_physical_replication_slot('repmgr_slot_1002', TRUE) DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" DEBUG: set_config(): SET synchronous_commit TO 'local' DEBUG: get_node_record(): SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name, NULL AS attached FROM repmgr.nodes n WHERE n.node_id = 1001 NOTICE: setting node 1002's upstream to node 1000 DEBUG: create_recovery_file(): creating "/bitnami/postgresql/data/recovery.conf"... DEBUG: recovery.conf line: standby_mode = 'on' DEBUG: recovery.conf line: primary_conninfo = 'user=repmgr password=anotherpassword*** connect_timeout=5 host=''project***-prod-store-postgresql-ha-postgresql-0.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local'' port=5432 application_name=''project***-prod-store-postgresql-ha-postgresql-2''' DEBUG: recovery.conf line: recovery_target_timeline = 'latest' DEBUG: recovery.conf line: primary_slot_name = 'repmgr_slot_1002' DEBUG: is_server_available(): ping status for "user=repmgr password=anotherpassword*** host=project***-prod-store-postgresql-ha-postgresql-2.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local dbname=repmgr port=5432 connect_timeout=5" is PQPING_OK NOTICE: stopping server using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -D '/bitnami/postgresql/data' -w -m fast stop" DEBUG: executing: /opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -D '/bitnami/postgresql/data' -w -m fast stop 2>/tmp/repmgr_command.5YBIAb 2021-12-13 09:48:48.029 GMT [291] LOG: received fast shutdown request 2021-12-13 09:48:48.032 GMT [291] LOG: aborting any active transactions 2021-12-13 09:48:48.033 GMT [297] FATAL: terminating walreceiver process due to administrator command 2021-12-13 09:48:48.035 GMT [294] LOG: shutting down 2021-12-13 09:48:48.050 GMT [291] LOG: database system is shut down DEBUG: result of command was 141 (36096) DEBUG: local_command(): output returned was: waiting for server to shut down.... done NOTICE: starting server using "/opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' start" DEBUG: executing: /opt/bitnami/postgresql/bin/pg_ctl -o "--config-file="/opt/bitnami/postgresql/conf/postgresql.conf" --external_pid_file="/opt/bitnami/postgresql/tmp/postgresql.pid" --hba_file="/opt/bitnami/postgresql/conf/pg_hba.conf"" -w -D '/bitnami/postgresql/data' start 2>/tmp/repmgr_command.aIkDad DEBUG: result of command was 141 (36096) DEBUG: local_command(): output returned was: waiting for server to shut down.... done waiting for server to start....2021-12-13 09:48:48.176 GMT [376] LOG: pgaudit extension initialized DEBUG: connecting to: "user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path=" ERROR: connection to database failed DETAIL: could not translate host name "project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local" to address: Name or service not known DETAIL: attempted to connect using: user=repmgr password=anotherpassword*** connect_timeout=5 dbname=repmgr host=project***-prod-store-postgresql-ha-postgresql-1.project***-prod-store-postgresql-ha-postgresql-headless.web.svc.cluster.local port=5432 fallback_application_name=repmgr options=-csearch_path= WARNING: unable to connect to old upstream node 1001 to remove replication slot HINT: if reusing this node, you should manually remove any inactive replication slots DEBUG: update_node_record_status(): UPDATE repmgr.nodes SET type = 'standby', upstream_node_id = 1000, active = TRUE WHERE node_id = 1002 NOTICE: STANDBY FOLLOW successful DETAIL: standby attached to upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) DEBUG: _create_event(): event is "standby_follow" for node 1002 DEBUG: get_recovery_type(): SELECT pg_catalog.pg_is_in_recovery() DEBUG: _create_event(): INSERT INTO repmgr.events ( node_id, event, successful, details ) VALUES ($1, $2, $3, $4) RETURNING event_timestamp DEBUG: _create_event(): Event timestamp is "2021-12-13 09:48:48.268433+00" DEBUG: _create_event(): command is '/opt/bitnami/repmgr/events/router.sh %n %e %s "%t" "%d"' INFO: executing notification command for event "standby_follow" DETAIL: command is: /opt/bitnami/repmgr/events/router.sh 1002 standby_follow 1 "2021-12-13 09:48:48.268433+00" "standby attached to upstream node \"project***-prod-store-postgresql-ha-postgresql-0\" (ID: 1000)" INFO: set_repmgrd_pid(): provided pidfile is /opt/bitnami/repmgr/tmp/repmgr.pid [2021-12-13 09:48:48] [NOTICE] node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002) now following new upstream node "project***-prod-store-postgresql-ha-postgresql-0" (ID: 1000) postgresql-repmgr 09:48:48.44 DEBUG ==> Executing SQL command: SELECT upstream_node_id FROM repmgr.nodes WHERE node_id=1002 [2021-12-13 10:04:38] [ERROR] unable to determine if server is in recovery [2021-12-13 10:04:38] [DETAIL] could not receive data from server: Connection timed out [2021-12-13 10:04:38] [DETAIL] query text is: SELECT pg_catalog.pg_is_in_recovery() [2021-12-13 10:04:38] [NOTICE] local node "project***-prod-store-postgresql-ha-postgresql-2" (ID: 1002)'s upstream appears to have changed, restarting monitoring [2021-12-13 10:04:38] [DETAIL] currently monitoring upstream 1001; new upstream is 1000 ```

It looks like unexpected name lookup failure during repmgr startup may be the root cause of cluster being stuck in split-brain state and if it's true then all we need is to add DNS lookup retries here and there...

k-caps commented 2 years ago

I'm having the same issue. In my case I was able to narrow it down to certain VM network behaviors - for example, running under Openstack as my vm provider, in cases where a VM's private network is disconnected this state happens. This private network failure causes the VM to be unable to access its own block storage. It seems that however repmgr is checking connectivity for cluster status check (the one that shows primary as unreachable) is not the same method as that which causes failover to occur, since I would expect that if not having storage is enough to report unreachable (and it is!) then failover should definitely occur here.