Closed PatrykKlimowicz closed 1 year ago
It seems a very specific use case difficult to reproduce on our side and very tied to your scenario.
For information regarding the application itself, customization of the content within the application, or questions about the use of the technology or infrastructure; we highly recommend checking forums and user guides made available by the project behind the application or the technology.
That said, we will keep this ticket open until the stale bot closes it just in case someone from the community adds some valuable info.
I workaroud this by using the official postgres image: postgres:11.16-bullseye
and setting runAsUser: 0
- I do not care about this as database is a dump from real database, and this deployment is used only for testing.
Now I see the following error on replica:
PostgreSQL Database directory appears to contain a database; Skipping initialization
2022-11-24 09:33:40.162 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-11-24 09:33:40.162 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-11-24 09:33:40.178 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-11-24 09:33:40.211 UTC [26] LOG: database system was shut down at 2022-11-24 09:31:08 UTC
2022-11-24 09:33:40.229 UTC [1] LOG: database system is ready to accept connections
2022-11-24 09:33:49.746 UTC [39] FATAL: role "example-user" does not exist
2022-11-24 09:33:59.738 UTC [47] FATAL: role "example-user" does not exist
2022-11-24 09:34:05.801 UTC [48] LOG: incomplete startup packet
2022-11-24 09:34:09.750 UTC [57] FATAL: role "example-user" does not exist
Due to that the primary database is not ready:
waiting for server to shut down....2022-11-24 09:33:36.541 UTC [49] LOG: aborting any active transactions
2022-11-24 09:33:36.543 UTC [49] LOG: background worker "logical replication launcher" (PID 56) exited with exit code 1
2022-11-24 09:33:36.544 UTC [51] LOG: shutting down
2022-11-24 09:33:37.124 UTC [49] LOG: database system is shut down
done
server stopped
PostgreSQL init process complete; ready for start up.
2022-11-24 09:33:37.174 UTC [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-11-24 09:33:37.174 UTC [1] LOG: listening on IPv6 address "::", port 5432
2022-11-24 09:33:37.183 UTC [1] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2022-11-24 09:33:37.206 UTC [100] LOG: database system was shut down at 2022-11-24 09:33:37 UTC
2022-11-24 09:33:37.217 UTC [1] LOG: database system is ready to accept connections
2022-11-24 09:34:09.933 UTC [139] LOG: incomplete startup packet
Why the replica is trying to use the primary user? Any ideas?
I tried:
auth
sectionThis sadly did not help and the error is the same.
The update:
postgres
user as super user@carrodher some questions about readReplica:
I managed to create an image with hypopg extension:
FROM bitnami/postgresql:11.18.0-debian-11-r4
# root access needed to run make commands
USER 0
# install hypopg extension on the image
RUN apt-get update && \
apt-get -y install wget unzip make gcc && \
wget --no-check-certificate https://github.com/HypoPG/hypopg/archive/1.1.2.zip && \
unzip 1.1.2.zip && \
cd hypopg-1.1.2 && \
make && \
make install
I discovered that my user before running the restore on the dump can do nothing:
Should not the primary user has a SUPERUSER by default if I do not specify any other user?
Moving forward with this I decided to drop the "special" primary user and stick to postgres. readReplica
has its own user as it was. There should be no error regarding restoring the dump, but now the custom scripts are not taken into consideration...
Any ideas why the custom scripts from PVC are not executed despite they are on the pod?
image:
registry: registry.io
repository: common/postgres
tag: latest
debug: true
# tag: 11.18.0-debian-11-r4
architecture: replication
fullnameOverride: "this-is-overwritten-in-helmfile"
global:
postgresql:
auth:
postgresPassword: "password"
auth:
enablePostgresUser: true
replicationUsername: "username-ro"
replicationPassword: "password"
primary:
podSecurityContext:
fsGroup: 0
containerSecurityContext:
runAsUser: 0
persistence:
existingClaim: phoenix-db-dumper-claim
mountPath: /docker-entrypoint-initdb.d
subPath: "this-is-overwritten-in-helmfile"
accessModes: "ReadWriteMany"
readReplicas:
replicaCount: 1
numSynchronousReplicas: 1
podSecurityContext:
fsGroup: 0
containerSecurityContext:
runAsUser: 0
And yet another update. I manage to workaround the issue with primary DB:
values.yaml
image:
registry: registry.io
repository: common/postgres
tag: latest
debug: true
architecture: replication
fullnameOverride: "this-is-overwritten-in-helmfile"
global:
postgresql:
auth:
postgresPassword: "" # this will be empty
volumePermissions:
enabled: true
auth:
enablePostgresUser: true
replicationUsername: "" # user and password are specified
replicationPassword: ""
primary:
podSecurityContext:
fsGroup: 0
containerSecurityContext:
runAsUser: 0
persistence:
existingClaim: db-dumper-claim
mountPath: /docker-entrypoint-initdb.d
subPath: "this-is-overwritten-in-helmfile"
accessModes: "ReadWriteMany"
readReplicas:
replicaCount: 1
podSecurityContext:
fsGroup: 0
containerSecurityContext:
runAsUser: 0
custom createdb.sql
CREATE DATABASE db1;
CREATE DATABASE db2;
# this will return postgres
SELECT current_user;
# some debug stuff
SELECT table_catalog, table_schema, table_name, privilege_type
FROM information_schema.table_privileges
WHERE grantee = 'user';
SELECT usename AS role_name,
CASE
WHEN usesuper AND usecreatedb THEN
CAST('superuser, create database' as pg_catalog.text)
WHEN usesuper THEN
CAST('superuser' as pg_catalog.text)
WHEN usecreatedb THEN
CAST('create database' as pg_catalog.text)
ELSE
CAST('' as pg_catalog.text)
END role_attributes
FROM pg_catalog.pg_user
ORDER BY role_name desc;
# let's create a user that will restore the dump and then will be used by apps
CREATE USER user SUPERUSER PASSWORD '${POSTGRES_PASSWORD}';
# debug stuff to see users and privileges
SELECT table_catalog, table_schema, table_name, privilege_type
FROM information_schema.table_privileges
WHERE grantee = 'user';
SELECT usename AS role_name,
CASE
WHEN usesuper AND usecreatedb THEN
CAST('superuser, create database' as pg_catalog.text)
WHEN usesuper THEN
CAST('superuser' as pg_catalog.text)
WHEN usecreatedb THEN
CAST('create database' as pg_catalog.text)
ELSE
CAST('' as pg_catalog.text)
END role_attributes
FROM pg_catalog.pg_user
ORDER BY role_name desc;
custom init.sh
:
#!/bin/bash
# switch to our new user to restore dump
PGPASSWORD=$POSTGRES_PASSWORD psql -U user
# restore dumps
pg_restore -v -c --if-exists -U user --no-owner -d db1 /docker-entrypoint-initdb.d/db1_latest.dump
pg_restore -v -c --if-exists -U user --no-owner -d db2 /docker-entrypoint-initdb.d/db2_latest.dump
And with that the primary db is up and running with the valid data 😄
Now there is an issue with readReplica
:
postgresql 12:56:39.84 INFO ==> ** Starting PostgreSQL setup **
postgresql 12:56:39.93 INFO ==> Validating settings in POSTGRESQL_* env vars..
postgresql 12:56:40.17 INFO ==> Loading custom pre-init scripts...
postgresql 12:56:40.21 INFO ==> Initializing PostgreSQL database...
postgresql 12:56:40.23 DEBUG ==> Ensuring expected directories/files exist...
postgresql 12:56:40.42 INFO ==> pg_hba.conf file not detected. Generating it...
postgresql 12:56:40.44 INFO ==> Generating local authentication configuration
postgresql 12:56:40.53 INFO ==> Deploying PostgreSQL with persisted data...
postgresql 12:56:40.63 INFO ==> Configuring replication parameters
postgresql 12:56:40.83 INFO ==> Configuring fsync
postgresql 12:56:40.87 INFO ==> Setting up streaming replication slave...
postgresql 12:56:41.17 INFO ==> Loading custom scripts...
postgresql 12:56:41.23 INFO ==> Enabling remote connections
postgresql 12:56:41.28 INFO ==> ** PostgreSQL setup finished! **
postgresql 12:56:41.40 INFO ==> ** Starting PostgreSQL **
2022-11-25 12:56:41.494 GMT [1] LOG: pgaudit extension initialized
2022-11-25 12:56:41.494 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-11-25 12:56:41.494 GMT [1] LOG: listening on IPv6 address "::", port 5432
2022-11-25 12:56:41.612 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-11-25 12:56:41.631 GMT [158] LOG: database system was shut down in recovery at 2022-11-25 12:54:54 GMT
2022-11-25 12:56:41.632 GMT [158] LOG: entering standby mode
2022-11-25 12:56:41.644 GMT [158] LOG: redo starts at 0/2000028
2022-11-25 12:56:41.653 GMT [158] LOG: consistent recovery state reached at 0/3000060
2022-11-25 12:56:41.653 GMT [158] LOG: invalid record length at 0/3000060: wanted 24, got 0
2022-11-25 12:56:41.653 GMT [1] LOG: database system is ready to accept read only connections
2022-11-25 12:56:41.667 GMT [162] FATAL: database system identifier differs between the primary and standby
2022-11-25 12:56:41.667 GMT [162] DETAIL: The primary's identifier is 7169936751407583316, the standby's identifier is 7169918321491947604.
2022-11-25 12:56:41.677 GMT [163] FATAL: database system identifier differs between the primary and standby
2022-11-25 12:56:41.677 GMT [163] DETAIL: The primary's identifier is 7169936751407583316, the standby's identifier is 7169918321491947604.
2022-11-25 12:56:46.678 GMT [164] FATAL: database system identifier differs between the primary and standby
2022-11-25 12:56:46.678 GMT [164] DETAIL: The primary's identifier is 7169936751407583316, the standby's identifier is 7169918321491947604.
2022-11-25 12:56:47.774 GMT [171] FATAL: password authentication failed for user "postgres"
2022-11-25 12:56:47.774 GMT [171] DETAIL: User "postgres" has no password assigned.
Connection matched pg_hba.conf line 1: "host all all 0.0.0.0/0 md5"
2022-11-25 12:56:51.682 GMT [172] FATAL: database system identifier differs between the primary and standby
2022-11-25 12:56:51.682 GMT [172] DETAIL: The primary's identifier is 7169936751407583316, the standby's identifier is 7169918321491947604.
2022-11-25 12:56:56.684 GMT [173] FATAL: database system identifier differs between the primary and standby
2022-11-25 12:56:56.684 GMT [173] DETAIL: The primary's identifier is 7169936751407583316, the standby's identifier is 7169918321491947604.
2022-11-25 12:56:57.760 GMT [180] FATAL: password authentication failed for user "postgres"
2022-11-25 12:56:57.760 GMT [180] DETAIL: User "postgres" has no password assigned.
@carrodher @rafariossaa could you help with this readReplica
issue?
EDIT: in another issue I found info that the old PVC might cause the issue, so I remove the PVC created by replica. Now there is an error only about identifier:
postgresql 13:37:14.51 INFO ==> ** Starting PostgreSQL **
2022-11-25 13:37:14.584 GMT [1] LOG: pgaudit extension initialized
2022-11-25 13:37:14.584 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-11-25 13:37:14.584 GMT [1] LOG: listening on IPv6 address "::", port 5432
2022-11-25 13:37:14.625 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-11-25 13:37:14.647 GMT [158] LOG: database system was shut down in recovery at 2022-11-25 13:37:00 GMT
2022-11-25 13:37:14.647 GMT [158] LOG: entering standby mode
2022-11-25 13:37:14.652 GMT [158] LOG: consistent recovery state reached at 0/30000D0
2022-11-25 13:37:14.652 GMT [158] LOG: invalid record length at 0/30000D0: wanted 24, got 0
2022-11-25 13:37:14.653 GMT [1] LOG: database system is ready to accept read only connections
2022-11-25 13:37:30.161 GMT [162] FATAL: database system identifier differs between the primary and standby
2022-11-25 13:37:30.161 GMT [162] DETAIL: The primary's identifier is 7169947218002104403, the standby's identifier is 7169946625880223830.
2022-11-25 13:37:30.169 GMT [171] FATAL: database system identifier differs between the primary and standby
2022-11-25 13:37:30.169 GMT [171] DETAIL: The primary's identifier is 7169947218002104403, the standby's identifier is 7169946625880223830.
2022-11-25 13:37:35.172 GMT [179] FATAL: database system identifier differs between the primary and standby
Hi, Have you tried to delete the replica pvc and restart the replica pod ? That should recreate the pvc and start getting updates from primary.
@rafariossaa I set the persistance on readreplica to false and everything starts working. The removal of PVC was not enough for me
In that case be aware you are not persisting the data in the replicas. What happened when you removed the PVCs ?
In that case be aware you are not persisting the data in the replicas.
But the data from primary replica is synchrounused?
What happened when you removed the PVCs ?
As I wrote: in another issue I found info that the old PVC might cause the issue, so I remove the PVC created by replica. Now there is an error only about identifier:
postgresql 13:37:14.51 INFO ==> ** Starting PostgreSQL **
2022-11-25 13:37:14.584 GMT [1] LOG: pgaudit extension initialized
2022-11-25 13:37:14.584 GMT [1] LOG: listening on IPv4 address "0.0.0.0", port 5432
2022-11-25 13:37:14.584 GMT [1] LOG: listening on IPv6 address "::", port 5432
2022-11-25 13:37:14.625 GMT [1] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2022-11-25 13:37:14.647 GMT [158] LOG: database system was shut down in recovery at 2022-11-25 13:37:00 GMT
2022-11-25 13:37:14.647 GMT [158] LOG: entering standby mode
2022-11-25 13:37:14.652 GMT [158] LOG: consistent recovery state reached at 0/30000D0
2022-11-25 13:37:14.652 GMT [158] LOG: invalid record length at 0/30000D0: wanted 24, got 0
2022-11-25 13:37:14.653 GMT [1] LOG: database system is ready to accept read only connections
2022-11-25 13:37:30.161 GMT [162] FATAL: database system identifier differs between the primary and standby
2022-11-25 13:37:30.161 GMT [162] DETAIL: The primary's identifier is 7169947218002104403, the standby's identifier is 7169946625880223830.
2022-11-25 13:37:30.169 GMT [171] FATAL: database system identifier differs between the primary and standby
2022-11-25 13:37:30.169 GMT [171] DETAIL: The primary's identifier is 7169947218002104403, the standby's identifier is 7169946625880223830.
2022-11-25 13:37:35.172 GMT [179] FATAL: database system identifier differs between the primary and standby
In that case be aware you are not persisting the data in the replicas. But the data from primary replica is synchrounused?
Yes, replication should be working.
in another issue I found info that the old PVC might cause the issue, so I remove the PVC created by replica. Now there is an error only about identifier:
The issue here is that the IDs from primary and standby differs. This ID is generated when started. Could you try to scale the deployment to only one node, then delete the replica's PVC and then scale it up ?. That should recreate the standby node and start replicating the data without errors.
@rafariossaa since the data will be replicated continuously between primary db and read db then I'm good to not have the PVC for read replica 😄 (or you meant that the replication will happen only once at startup? if yes, can be a continuous replication achieved?)
There would be a first initial update when the replica starts, and then it is going to be updated each time the primary node updates. I am telling this just in case you wanted to reduce the risk of data loss, if you know the risks it is ok then.
Name and Version
bitnami/postgresql 11.9.13
What steps will reproduce the bug?
Are you using any custom parameters or values?
What is the expected behavior?
The dump can be successfully restored and used by PostgreSQL deployment:
What do you see instead?
Additional information
This approach was working just fine for the custom deployment of PostgreSQL (image version
11.16-bullseye
) that mounts the same PVC. No issues were visible there. Why something is wrong here with extension/permissions?The init script looks like this: