EnterpriseDB / repmgr

A lightweight replication manager for PostgreSQL (Postgres)
https://repmgr.org/
Other
1.58k stars 252 forks source link

node "node_master" (ID: 1) is registered as primary but running as standby #848

Closed Dante-Eleuterio closed 6 months ago

Dante-Eleuterio commented 8 months ago

Hello, I've been trying to set up a replication and everything was fine,including the replication itself as the cloning worked and is active, until I tried to register my standby server: It gives me the following message when I run "usr/bin/repmgr -f /etc/postgresql/16/main/repmgr.conf standby register" :

  INFO: connecting to local node "node2" (ID: 2)
  INFO: connecting to primary database
  ERROR: unable to connect to the primary database
  HINT: a primary node must be configured before registering a standby node

Also when I run " /usr/bin/repmgr -f /etc/postgresql/16/main/repmgr.conf cluster show" at the standby server to check the nodes I receive the following message:

ID | Name        | Role    | Status               | Upstream | Location | Priority | Timeline | Connection string                          
----+-------------+---------+----------------------+----------+----------+----------+----------+---------------------------------------------
 1  | node_master | primary | ! running as standby |          | default  | 100      | 1        | user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - node "node_master" (ID: 1) is registered as primary but running as standby

But when I run it at the master server there is no issue:

 ID | Name        | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                          
----+-------------+---------+-----------+----------+----------+----------+----------+---------------------------------------------
 1  | node_master | primary | * running |          | default  | 100      | 1        | user=repmgr dbname=repmgr connect_timeout=2

So I don't know what the problem is. This is the repmgr.conf file of the standby server:

node_id=2

node_name=node2

conninfo='user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/home/postgres/16/main'

failover=automatic

promote_command='/usr/bin/repmgr standby promote -f /etc/postgresql/16/main/repmgr.conf --log-to-file'

follow_command='/usr/bin/repmgr standby follow -f /etc/postgresql/16/main/repmgr.conf --log-to-file --upstream-node-id=%n'

And here is the repmgr.conf file of the master server:

node_id=1

node_name=node_master

conninfo='user=repmgr dbname=repmgr connect_timeout=2'

data_directory='/home/postgres/16/main'

failover=automatic

promote_command='/usr/bin/repmgr standby promote -f /etc/postgresql/16/main/repmgr.conf --log-to-file'

follow_command='/usr/bin/repmgr standby follow -f /etc/postgresql/16/main/repmgr.conf --log-to-file --upstream-node-id=%n'
rnalrd commented 6 months ago

any progress here? run into the same issue

Dante-Eleuterio commented 6 months ago

@rnalrd I gave up not much after this

rnalrd commented 6 months ago

I've just found out, the problem is wrong "conninfo" in nodes table (repmgr db). The host value must be the primary server.

martinmarques commented 6 months ago

All conninfo settings need to have a host associated to it. That's what is missing. When running repmgr locally it works fine because it connects via the local socket, but remote nodes can't use the conninfo to connect to the primary.