[bitnami/postgresql-repmgr] Can't get witness node working

seriouz commented 1 year ago

Name and Version

bitnami/postgresql-repmgr

What architecture are you using?

amd64

What steps will reproduce the bug?

I have problems setting up the PostgreSQL Repmgr with the new witness node. I've tried multiple variants but neither did work. The witness node in Docker throws an error and shuts down:

Here is my compose:

version: '3.8'
services:

  pg-0:
    image: bitnami/postgresql-repmgr:latest
    ports:
      - 6430:5432
    volumes:
      - /docker/local/database_repmgr2/pg-0:/bitnami/postgresql
    environment:
      - POSTGRESQL_POSTGRES_PASSWORD=adminpassword
      - POSTGRESQL_USERNAME=customuser
      - POSTGRESQL_PASSWORD=custompassword
      - POSTGRESQL_DATABASE=customdatabase
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_PRIMARY_PORT=5432
      - REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
      - REPMGR_NODE_NAME=pg-0
      - REPMGR_NODE_NETWORK_NAME=pg-0
      - REPMGR_PORT_NUMBER=5432
    deploy:
      resources:
        limits:
           memory: 5G
      replicas: 1
      placement:
        max_replicas_per_node: 1

  pg-1:
    image: bitnami/postgresql-repmgr:latest
    ports:
      - 6431:5432
    volumes:
      - /docker/local/database_repmgr2/pg-1:/bitnami/postgresql
    environment:
      - POSTGRESQL_POSTGRES_PASSWORD=adminpassword
      - POSTGRESQL_USERNAME=customuser
      - POSTGRESQL_PASSWORD=custompassword
      - POSTGRESQL_DATABASE=customdatabase
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_PRIMARY_PORT=5432
      - REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
      - REPMGR_NODE_NAME=pg-1
      - REPMGR_NODE_NETWORK_NAME=pg-1
      - REPMGR_PORT_NUMBER=5432
    deploy:
      resources:
        limits:
           memory: 5G
      replicas: 1
      placement:
        max_replicas_per_node: 1

  pgw-0:
    image: bitnami/postgresql-repmgr:latest
    ports:
      - 6439:5432
    volumes:
      - /docker/local/database_repmgr2/pgw-0:/bitnami/postgresql
    environment:
      - POSTGRESQL_POSTGRES_PASSWORD=adminpassword
      - POSTGRESQL_USERNAME=customuser
      - POSTGRESQL_PASSWORD=custompassword
      - POSTGRESQL_DATABASE=customdatabase
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_PRIMARY_PORT=5432
      - REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
      - REPMGR_NODE_NAME=pgw-0
      - REPMGR_NODE_NETWORK_NAME=pgw-0
      - REPMGR_PORT_NUMBER=5432
      - REPMGR_NODE_TYPE=witness
      - BITNAMI_DEBUG=true
    deploy:
      resources:
        limits:
           memory: 5G
      replicas: 1
      placement:
        max_replicas_per_node: 1

What do you see instead?

Then the container fails with on the first time (no database exists) with:

Success. You can now start the database server using:

    /opt/bitnami/postgresql/bin/pg_ctl -D /bitnami/postgresql/data -l logfile start

initdb: warning: enabling "trust" authentication for local connections

initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb.
postgresql-repmgr 15:07:12.39 INFO  ==> Starting PostgreSQL in background...

waiting for server to start....2023-01-27 15:07:12.432 GMT [165] LOG:  pgaudit extension initialized
2023-01-27 15:07:12.443 GMT [165] LOG:  redirecting log output to logging collector process
2023-01-27 15:07:12.443 GMT [165] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:07:12.443 GMT [165] LOG:  starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:07:12.444 GMT [165] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2023-01-27 15:07:12.444 GMT [165] LOG:  could not bind IPv6 address "::1": Cannot assign requested address
2023-01-27 15:07:12.450 GMT [165] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:07:12.458 GMT [169] LOG:  database system was shut down at 2023-01-27 15:07:12 GMT
2023-01-27 15:07:12.467 GMT [165] LOG:  database system is ready to accept connections

 done

server started
CREATE DATABASE
postgresql-repmgr 15:07:12.61 INFO  ==> Changing password of postgres
ALTER ROLE
postgresql-repmgr 15:07:12.64 INFO  ==> Creating user customuser
CREATE ROLE
postgresql-repmgr 15:07:12.67 INFO  ==> Granting access to "customuser" to the database "customdatabase"

GRANT
ALTER DATABASE
postgresql-repmgr 15:07:12.71 INFO  ==> Setting ownership for the 'public' schema database "customdatabase" to "customuser"
ALTER SCHEMA
postgresql-repmgr 15:07:12.74 INFO  ==> Creating replication user repmgr
CREATE ROLE
postgresql-repmgr 15:07:12.78 INFO  ==> Stopping PostgreSQL...

waiting for server to shut down....2023-01-27 15:07:12.790 GMT [165] LOG:  received fast shutdown request
2023-01-27 15:07:12.793 GMT [165] LOG:  aborting any active transactions
2023-01-27 15:07:12.796 GMT [165] LOG:  background worker "logical replication launcher" (PID 173) exited with exit code 1
2023-01-27 15:07:12.796 GMT [167] LOG:  shutting down
2023-01-27 15:07:12.821 GMT [167] LOG:  checkpoint starting: shutdown immediate
2023-01-27 15:07:12.928 GMT [167] LOG:  checkpoint complete: wrote 927 buffers (5.7%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.025 s, sync=0.018 s, total=0.110 s; sync files=257, longest=0.004 s, average=0.001 s; distance=11273 kB, estimate=11273 kB
2023-01-27 15:07:12.940 GMT [165] LOG:  database system is shut down

 done

server stopped
postgresql-repmgr 15:07:13.02 INFO  ==> Configuring replication parameters
postgresql-repmgr 15:07:13.07 INFO  ==> Configuring fsync
postgresql-repmgr 15:07:13.09 INFO  ==> Starting PostgreSQL in background...

waiting for server to start....2023-01-27 15:07:13.134 GMT [244] LOG:  pgaudit extension initialized
2023-01-27 15:07:13.146 GMT [244] LOG:  redirecting log output to logging collector process
2023-01-27 15:07:13.146 GMT [244] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:07:13.146 GMT [244] LOG:  starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:07:13.146 GMT [244] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-01-27 15:07:13.147 GMT [244] LOG:  listening on IPv6 address "::", port 5432
2023-01-27 15:07:13.152 GMT [244] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:07:13.161 GMT [248] LOG:  database system was shut down at 2023-01-27 15:07:12 GMT
2023-01-27 15:07:13.169 GMT [244] LOG:  database system is ready to accept connections

 done

server started
postgresql-repmgr 15:07:13.23 INFO  ==> Creating repmgr user: repmgr
ERROR:  role "repmgr" already exists
2023-01-27 15:07:13.262 GMT [259] ERROR:  role "repmgr" already exists
2023-01-27 15:07:13.262 GMT [259] STATEMENT:  CREATE ROLE "repmgr" WITH LOGIN CREATEDB PASSWORD 'repmgrpassword';
ALTER ROLE
ALTER ROLE
postgresql-repmgr 15:07:13.32 INFO  ==> Creating repmgr database: repmgr
CREATE DATABASE
postgresql-repmgr 15:07:13.40 INFO  ==> Unregistering witness node...
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
ERROR: _get_primary_connection(): unable to retrieve node records
DETAIL: 
ERROR:  relation "repmgr.nodes" does not exist

LINE 1: ...imary' THEN 1 ELSE 2 END AS type_priority    FROM repmgr.nod...

                                                             ^
DETAIL: query text is:

  SELECT node_id, conninfo,          CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority     FROM repmgr.nodes    WHERE active IS TRUE      AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
ERROR: unable to connect to primary
DETAIL: 

connection pointer is NULL
2023-01-27 15:07:13.427 GMT [275] ERROR:  relation "repmgr.nodes" does not exist at character 108
2023-01-27 15:07:13.427 GMT [275] STATEMENT:    SELECT node_id, conninfo,          CASE WHEN type = 'primary' THEN 1 ELSE 2 END AS type_priority       FROM repmgr.nodes    WHERE active IS TRUE      AND type != 'witness' ORDER BY active DESC, type_priority, priority, node_id
postgresql-repmgr 15:07:13.43 INFO  ==> Registering witness node...
postgresql-repmgr 15:07:13.43 INFO  ==> Waiting for primary node...
postgresql-repmgr 15:07:13.44 DEBUG ==> Wait for schema repmgr.repmgr on 'pg-0:5432', will try 6 times with 10 delay seconds (TIMEOUT=60)
postgresql-repmgr 15:07:13.47 DEBUG ==> Schema repmgr.repmgr exists!
NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed

[REPMGR EVENT] Node id: 1000; Event type: cluster_created; Success [1|0]: 1; Time: 2023-01-27 15:07:13.577239+00;  Details: 

Looking for the script: /opt/bitnami/repmgr/events/execs/cluster_created.sh

[REPMGR EVENT] no script '/opt/bitnami/repmgr/events/execs/cluster_created.sh' found. Skipping...
ERROR: node "pgw-0" (ID: 1000) is already registered as a primary node

HINT: use "repmgr primary unregister" to remove a non-witness node record
postgresql-repmgr 15:07:13.61 INFO  ==> Stopping PostgreSQL...

waiting for server to shut down....2023-01-27 15:07:13.626 GMT [244] LOG:  received fast shutdown request
2023-01-27 15:07:13.628 GMT [244] LOG:  aborting any active transactions
2023-01-27 15:07:13.630 GMT [244] LOG:  background worker "logical replication launcher" (PID 252) exited with exit code 1
2023-01-27 15:07:13.631 GMT [246] LOG:  shutting down
2023-01-27 15:07:13.670 GMT [246] LOG:  checkpoint starting: shutdown immediate
2023-01-27 15:07:13.722 GMT [246] LOG:  checkpoint complete: wrote 939 buffers (5.7%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=0.018 s, total=0.055 s; sync files=259, longest=0.004 s, average=0.001 s; distance=16384 kB, estimate=16384 kB
2023-01-27 15:07:13.734 GMT [244] LOG:  database system is shut down

 done

server stopped

The other times it fails with:

postgresql-repmgr 15:09:54.19 
postgresql-repmgr 15:09:54.19 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 15:09:54.19 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 15:09:54.19 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 15:09:54.20 
postgresql-repmgr 15:09:54.21 DEBUG ==> Configuring libnss_wrapper...
postgresql-repmgr 15:09:54.23 INFO  ==> ** Starting PostgreSQL with Replication Manager setup **
postgresql-repmgr 15:09:54.26 INFO  ==> Validating settings in REPMGR_* env vars...
postgresql-repmgr 15:09:54.27 INFO  ==> Validating settings in POSTGRESQL_* env vars..
postgresql-repmgr 15:09:54.28 INFO  ==> Querying all partner nodes for common upstream node...
postgresql-repmgr 15:09:54.29 DEBUG ==> Checking node 'pg-0:5432'...
postgresql-repmgr 15:09:54.35 DEBUG ==> Pretending primary role node - 'pg-0:5432'
postgresql-repmgr 15:09:54.35 DEBUG ==> Pretending primary set to 'pg-0:5432'!
postgresql-repmgr 15:09:54.36 DEBUG ==> Checking node 'pg-1:5432'...
postgresql-repmgr 15:09:54.42 DEBUG ==> Pretending primary role node - 'pg-0:5432'
postgresql-repmgr 15:09:54.42 INFO  ==> Auto-detected primary node: 'pg-0:5432'
postgresql-repmgr 15:09:54.42 DEBUG ==> Primary node: 'pg-0:5432'
postgresql-repmgr 15:09:54.43 INFO  ==> Node configured as witness
postgresql-repmgr 15:09:54.44 INFO  ==> Preparing PostgreSQL configuration...
postgresql-repmgr 15:09:54.45 DEBUG ==> Injecting a new postgresql.conf file...
postgresql-repmgr 15:09:54.45 INFO  ==> postgresql.conf file not detected. Generating it...
postgresql-repmgr 15:09:54.58 DEBUG ==> Injecting a new pg_hba.conf file...
postgresql-repmgr 15:09:54.59 INFO  ==> Preparing repmgr configuration...
postgresql-repmgr 15:09:54.61 DEBUG ==> Node ID: '1000', Rol: 'witness', Primary Node: 'pg-0:5432'
postgresql-repmgr 15:09:54.61 INFO  ==> Initializing Repmgr...
postgresql-repmgr 15:09:54.63 INFO  ==> Initializing PostgreSQL database...
postgresql-repmgr 15:09:54.63 DEBUG ==> Copying files from /bitnami/postgresql/conf to /opt/bitnami/postgresql/conf
postgresql-repmgr 15:09:54.64 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/postgresql.conf detected
postgresql-repmgr 15:09:54.64 INFO  ==> Custom configuration /opt/bitnami/postgresql/conf/pg_hba.conf detected
postgresql-repmgr 15:09:54.65 DEBUG ==> Ensuring expected directories/files exist...
postgresql-repmgr 15:09:54.70 INFO  ==> Deploying PostgreSQL with persisted data...
postgresql-repmgr 15:09:54.73 INFO  ==> Configuring replication parameters
postgresql-repmgr 15:09:54.78 INFO  ==> Configuring fsync
postgresql-repmgr 15:09:54.80 INFO  ==> Starting PostgreSQL in background...

waiting for server to start....2023-01-27 15:09:54.855 GMT [176] LOG:  pgaudit extension initialized
2023-01-27 15:09:54.870 GMT [176] LOG:  redirecting log output to logging collector process
2023-01-27 15:09:54.870 GMT [176] HINT:  Future log output will appear in directory "/opt/bitnami/postgresql/logs".
2023-01-27 15:09:54.870 GMT [176] LOG:  starting PostgreSQL 15.1 on x86_64-pc-linux-gnu, compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit
2023-01-27 15:09:54.871 GMT [176] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2023-01-27 15:09:54.871 GMT [176] LOG:  listening on IPv6 address "::", port 5432
2023-01-27 15:09:54.876 GMT [176] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2023-01-27 15:09:54.883 GMT [180] LOG:  database system was shut down at 2023-01-27 15:09:46 GMT
2023-01-27 15:09:54.894 GMT [176] LOG:  database system is ready to accept connections

 done

server started
postgresql-repmgr 15:09:54.95 INFO  ==> Creating repmgr user: repmgr
2023-01-27 15:09:54.989 GMT [191] ERROR:  role "repmgr" already exists
2023-01-27 15:09:54.989 GMT [191] STATEMENT:  CREATE ROLE "repmgr" WITH LOGIN CREATEDB PASSWORD 'repmgrpassword';

ERROR:  role "repmgr" already exists

ALTER ROLE

ALTER ROLE
postgresql-repmgr 15:09:55.05 INFO  ==> Creating repmgr database: repmgr

ERROR:  database "repmgr" already exists
2023-01-27 15:09:55.078 GMT [204] ERROR:  database "repmgr" already exists
2023-01-27 15:09:55.078 GMT [204] STATEMENT:  CREATE DATABASE repmgr;
postgresql-repmgr 15:09:55.08 INFO  ==> Unregistering witness node...

NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"

ERROR: unable to connect to primary

DETAIL: 

connection pointer is NULL
postgresql-repmgr 15:09:55.12 INFO  ==> Registering witness node...
postgresql-repmgr 15:09:55.12 INFO  ==> Waiting for primary node...
postgresql-repmgr 15:09:55.13 DEBUG ==> Wait for schema repmgr.repmgr on 'pg-0:5432', will try 6 times with 10 delay seconds (TIMEOUT=60)
postgresql-repmgr 15:09:55.16 DEBUG ==> Schema repmgr.repmgr exists!

NOTICE: using provided configuration file "/opt/bitnami/repmgr/conf/repmgr.conf"

ERROR: node "pgw-0" (ID: 1000) is already registered as a primary node

HINT: use "repmgr primary unregister" to remove a non-witness node record
postgresql-repmgr 15:09:55.24 INFO  ==> Stopping PostgreSQL...

waiting for server to shut down....2023-01-27 15:09:55.255 GMT [176] LOG:  received fast shutdown request
2023-01-27 15:09:55.259 GMT [176] LOG:  aborting any active transactions
2023-01-27 15:09:55.262 GMT [176] LOG:  background worker "logical replication launcher" (PID 184) exited with exit code 1
2023-01-27 15:09:55.262 GMT [178] LOG:  shutting down
2023-01-27 15:09:55.313 GMT [178] LOG:  checkpoint starting: shutdown immediate
2023-01-27 15:09:55.336 GMT [178] LOG:  checkpoint complete: wrote 5 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.005 s, sync=0.005 s, total=0.025 s; sync files=4, longest=0.003 s, average=0.002 s; distance=16384 kB, estimate=16384 kB
2023-01-27 15:09:55.345 GMT [176] LOG:  database system is shut down

 done

server stopped

gongomgra commented 1 year ago

Hi @seriouz,

Thanks for reporting this issue, and sorry for the delay on getting back to you. We have reproduced the issue and are looking into it. We will write you back once we have an update.

seriouz commented 1 year ago

Thanks for investigating the issue.

gongomgra commented 1 year ago

Hi @seriouz,

We found the cause of the issue. The internal logic to determine the node ID uses the service name plus a custom REPMGR_NODE_ID_START_SEED environment variable. The reason behind the error is that the default value is 1000, and you have two services with the same initial-ID in your docker-compose file. I mean, for both services pg-0 and pgw-0, the calculated ID is 1000 by default in both services, so that's why you get the "this is already a primary node" error message. You can easily fix it by either setting the REPMGR_NODE_ID_START_SEED=2000 environment variable for the pgw-0 service, or by renaming the service with a non-duplicated number.

It worked fine for me after updating the definition for the pgw-0 service as below

  pgw-0:
    image: bitnami/postgresql-repmgr:latest
    ports:
      - 6439:5432
    volumes:
      - /docker/local/database_repmgr2/pgw-0:/bitnami/postgresql
    environment:
      - POSTGRESQL_POSTGRES_PASSWORD=adminpassword
      - POSTGRESQL_USERNAME=customuser
      - POSTGRESQL_PASSWORD=custompassword
      - POSTGRESQL_DATABASE=customdatabase
      - REPMGR_PASSWORD=repmgrpassword
      - REPMGR_PRIMARY_HOST=pg-0
      - REPMGR_PRIMARY_PORT=5432
      - REPMGR_PARTNER_NODES=pg-0:5432,pg-1:5432,pgw-0:5432
      - REPMGR_NODE_NAME=pgw-0
      - REPMGR_NODE_NETWORK_NAME=pgw-0
      - REPMGR_PORT_NUMBER=5432
      - REPMGR_NODE_TYPE=witness
      - BITNAMI_DEBUG=true
      - REPMGR_NODE_ID_START_SEED=2000
    deploy:
      resources:
        limits:
           memory: 5G
      replicas: 1
      placement:
        max_replicas_per_node: 1

After that, you can see in the logs that the witness node is properly connected to the cluster

$ docker-compose up -d && docker-compose logs -f
(...)
debian-11-pgw-0-1  | postgresql-repmgr 16:06:26.60 INFO  ==> ** Starting repmgrd **
debian-11-pgw-0-1  | [2023-03-22 16:06:26] [NOTICE] repmgrd (repmgrd 5.3.3) starting up
debian-11-pgw-0-1  | INFO:  set_repmgrd_pid(): provided pidfile is /tmp/repmgrd.pid
debian-11-pgw-0-1  | [2023-03-22 16:06:26] [NOTICE] starting monitoring of node "pgw-0" (ID: 2000)
debian-11-pg-0-1   | [2023-03-22 16:06:28] [NOTICE] new standby "pg-1" (ID: 1001) has connected
debian-11-pg-0-1   | [2023-03-22 16:06:28] [NOTICE] new witness "pgw-0" (ID: 2000) has connected

Also, if you check the cluster status from one of the nodes, you can see the three nodes

$ docker-compose exec -it pg-1 bash
I have no name!@29b979158258:/$ /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf cluster show
postgresql-repmgr 16:08:48.21
postgresql-repmgr 16:08:48.22 Welcome to the Bitnami postgresql-repmgr container
postgresql-repmgr 16:08:48.22 Subscribe to project updates by watching https://github.com/bitnami/containers
postgresql-repmgr 16:08:48.22 Submit issues and feature requests at https://github.com/bitnami/containers/issues
postgresql-repmgr 16:08:48.22

 ID   | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
------+-------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------------------
 1000 | pg-0  | primary | * running |          | default  | 100      | 1        | user=repmgr password=repmgrpassword host=pg-0 dbname=repmgr port=5432 connect_timeout=5
 1001 | pg-1  | standby |   running | pg-0     | default  | 100      | 1        | user=repmgr password=repmgrpassword host=pg-1 dbname=repmgr port=5432 connect_timeout=5
 2000 | pgw-0 | witness | * running | pg-0     | default  | 0        | n/a      | user=repmgr password=repmgrpassword host=pgw-0 dbname=repmgr port=5432 connect_timeout=5

This isn't a problem in the Helm chart because by default, helm enumerates the different pods with different ID numbers, so these naming collisions don't happen.

Please let us know if it worked for you using the extra environment variable.

seriouz commented 1 year ago

Thanks for you sharing! I've tested it and it worked flawlessly! I have just one request left: Could you please update the docs and mention the we have to use REPMGR_NODE_ID_START_SEED on vanilla Docker.

gongomgra commented 1 year ago

Hi @seriouz,

Thanks for your message. I'm glad it worked for you! We updated our README.md file adding a new section about this. Hope it helps!

bitnami / containers