Closed 616slayer616 closed 4 months ago
I'm facing the same issue on upgrading postgresql-ha from 14.0.5 to 14.0.6
I was able to update version by setting:
postgresql.replicaCount=1
postgresql.upgradeRepmgrExtension=true
as it was mentioned here - https://artifacthub.io/packages/helm/bitnami/postgresql-ha#to-8-0-0
Hi, This issue is under investigation, we will be back as soon as we have news.
Also have the same issue. upgradeRepmgrExtension
did not help. It says it is upgrading the repmgr extension but then gives the exact same error.
│ postgresql-repmgr 21:54:06.73 INFO ==> Upgrading repmgr extension... │
│ postgresql-repmgr 21:54:06.81 INFO ==> Stopping PostgreSQL... │
│ waiting for server to shut down.... done │
│ server stopped │
│ postgresql-repmgr 21:54:06.92 INFO ==> ** PostgreSQL with Replication Manager setup finished! ** │
│ │
│ postgresql-repmgr 21:54:06.95 INFO ==> Starting PostgreSQL in background... │
│ waiting for server to start....2024-06-07 21:54:07.041 GMT [166] LOG: pgaudit extension initialized │
│ 2024-06-07 21:54:07.051 GMT [166] LOG: redirecting log output to logging collector process │
│ 2024-06-07 21:54:07.051 GMT [166] HINT: Future log output will appear in directory "/opt/bitnami/postgresql/logs". │
│ 2024-06-07 21:54:07.051 GMT [166] LOG: starting PostgreSQL 16.3 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit │
│ 2024-06-07 21:54:07.052 GMT [166] LOG: listening on IPv4 address "0.0.0.0", port 5432 │
│ 2024-06-07 21:54:07.052 GMT [166] LOG: listening on IPv6 address "::", port 5432 │
│ 2024-06-07 21:54:07.058 GMT [166] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" │
│ 2024-06-07 21:54:07.067 GMT [170] LOG: database system was shut down in recovery at 2024-06-07 21:54:06 GMT │
│ 2024-06-07 21:54:07.067 GMT [170] LOG: entering standby mode │
│ 2024-06-07 21:54:07.073 GMT [170] LOG: redo starts at 4B/9D000028 │
│ 2024-06-07 21:54:07.074 GMT [170] LOG: consistent recovery state reached at 4B/9E004888 │
│ 2024-06-07 21:54:07.074 GMT [170] LOG: invalid record length at 4B/9E004888: expected at least 24, got 0 │
│ 2024-06-07 21:54:07.115 GMT [166] LOG: database system is ready to accept read-only connections │
│ 2024-06-07 21:54:07.121 GMT [171] FATAL: could not connect to the primary server: could not translate host name "redacted" to address │
│ done │
│ server started │
│ 2024-06-07 21:54:07.126 GMT [172] FATAL: could not connect to the primary server: could not translate host name "redacted" to address │
│ 2024-06-07 21:54:07.126 GMT [170] LOG: waiting for WAL to become available at 4B/9E002000 │
│ postgresql-repmgr 21:54:07.13 INFO ==> ** Starting repmgrd ** │
│ [2024-06-07 21:54:07] [NOTICE] repmgrd (repmgrd 5.4.1) starting up │
│ [2024-06-07 21:54:07] [ERROR] an older version of the "repmgr" extension is installed │
│ [2024-06-07 21:54:07] [DETAIL] extension version 5.3 is installed but newer version 5.4 is available │
│ [2024-06-07 21:54:07] [HINT] verify the repmgr installation is updated properly before continuing
Same issue when argo auto-updated our gitea install which has the postgres-ha chart as a dependency. Fix was as above; scale replicas down to 1 and set upgrad repmgr to true...wait for it to complete then 'undo' those changes to values.
Hi, @P-n-I , @kreatoo could you indicate the versions (from and to) that you are using ?
14.0.2 (16.2 postgres) to 14.0.3 (16.3 postgres) according to my internal slack channel history.
From 14.0.2 to 14.0.3 I found no issues. This is what I did:
$ helm install mypg bitnami/postgresql-ha --version=14.0.2 --set postgresql.password=adminpwd --set postgresql.repmgrPassword=repmgrpwd --set pgpool.adminPassword=pgpoolpwd
wait until it is up and running, and checked the status:
$ kubectl exec -it mypg-postgresql-ha-postgresql-0 -- /opt/bitnami/scripts/postgresql-repmgr/
...
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------------------------------+---------+-----------+---------------------------------+---------+-----+---------+--------------------
1000 | mypg-postgresql-ha-postgresql-0 | primary | * running | | running | 1 | no | n/a
1001 | mypg-postgresql-ha-postgresql-1 | standby | running | mypg-postgresql-ha-postgresql-0 | running | 1 | no | 0 second(s) ago
1002 | mypg-postgresql-ha-postgresql-2 | standby | running | mypg-postgresql-ha-postgresql-0 | running | 1 | no | 0 second(s) ago
Then upgraded, and checked the status once all nodes are in Running
state:
helm upgrade mypg bitnami/postgresql-ha --version=14.0.3 --set postgresql.password=adminpwd --set postgresql.repmgrPassword=repmgrpwd --set pgpool.adminPassword=pgpoolpwd
$ kubectl exec -it mypg-postgresql-ha-postgresql-0 -- /opt/bitnami/scripts/postgresql-repmgr/entrypoint.sh repmgr -f /opt/bitnami/repmgr/conf/repmgr.conf daemon status
...
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------------------------------+---------+-----------+---------------------------------+---------+-----+---------+--------------------
1000 | mypg-postgresql-ha-postgresql-0 | standby | running | mypg-postgresql-ha-postgresql-1 | running | 1 | no | 0 second(s) ago
1001 | mypg-postgresql-ha-postgresql-1 | primary | * running | | running | 1 | no | n/a
1002 | mypg-postgresql-ha-postgresql-2 | standby | running | mypg-postgresql-ha-postgresql-1 | running | 1 | no | 0 second(s) ago
@616slayer616 , when upgrading from 14.0.2
to 14.0.13
I had the same behavior that you, you would need to scale to one node and use postgresql.upgradeRepmgrExtension=true
, and then rescale the cluster.
This is not particular to this version "jump", but to any case where repmgr version was upgraded and it is not compatible. A message similar to this would appear in the logs:
postgresql-repmgr 13:57:07.50 INFO ==> ** Starting repmgrd **
[2024-06-12 13:57:07] [NOTICE] repmgrd (repmgrd 5.4.1) starting up
[2024-06-12 13:57:07] [ERROR] an older version of the "repmgr" extension is installed
[2024-06-12 13:57:07] [DETAIL] extension version 5.3 is installed but newer version 5.4 is available
All the nodes in the cluster need to use the same version, hence the scaling to one node, upgrading repmgr, and then rescaling the cluster again.
It is true that there is not info on this regards on the upgrade section. I will add a note in the README for clarification.
I tried upgrading from 14.0.3 to 14.1.1 and 14.2.0. I also scaled to one node and set postgresql.upgradeRepmgrExtension=true. I tried this about 20 times. And it did not help.
I created a new namespace where I deployed 14.0.3 and upgraded successfully. So it seems not to be a general error but something more specific. But I cannot imagine how any of my configuration could have caused this. Especially since I hardly have any custom configuration.
I have not found any issues when upgrading from 14.0.3
to 14.1.1
or 14.2.0
.
I used the following commands:
helm install mypg bitnami/postgresql-ha --version=14.0.3 \
--set postgresql.password=adminpwd \
--set postgresql.repmgrPassword=repmgrpwd \
--set pgpool.adminPassword=pgpoolpwd
helm upgrade mypg bitnami/postgresql-ha --version=14.1.1 \
--set postgresql.password=adminpwd \
--set postgresql.repmgrPassword=repmgrpwd \
--set pgpool.adminPassword=pgpoolpwd \
--set postgresql.upgradeRepmgrExtension=true \
--set postgresql.replicaCount=1
helm upgrade mypg bitnami/postgresql-ha --version=14.1.1 \
--set postgresql.password=adminpwd \
--set postgresql.repmgrPassword=repmgrpwd \
--set pgpool.adminPassword=pgpoolpwd \
--set postgresql.replicaCount=3
Not sure if it could be related to the database size. In my testings I have not inserted any data in the database. I would need to have consistent way of reproducing the issue in order to debug it.
I now tried it again in a new namespace and added all the data using pg_dump and pg_rsestore. And was not able to reproduce it. My plan now is to uninstall the chart, delete the pvcs and install it again and run pg_restore. but I would really like to know what the problem is here.
For debugging we could arrange a video call and I can share my screen. Otherwise I guess we won't get any further
Hi, Thanks for sharing your progress. Please, don't hesitate to share your findings.
I am sorry, but GH issues is our communication channel to solve issues.
This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.
I am hitting this same issue, and I believe I found the source of the problem. In each of the error logs posted above, they both show the same message:
2024-05-27 07:09:21.216 GMT [171] FATAL: could not connect to the primary server: could not translate host name "pg-ha-postgresql-ha-postgresql-1.pg-ha-postgresql-ha-postgresql-headless.db.svc.cluster-one" to address: Name or service not known
Postgresql still had reference to the previous master replica: postgresql-ha-postgresql-1
, not ...-0
. I scaled up my deployment to allow ...-1
to come up and upgrade, and it appears that node started up just fine. Then, I bounced ...-0
and it appears that also came up on the new version.
I do not believe this is the intended behavior, so I would recommend this issue re-opens. The upgrade process should be able to support any number of previous replicas, regardless of the prior master replica.
Additionally, I am still having trouble getting any other replicas besides these two up now, though I'm still running down if that is due to the upgrade or our own implementation.
EDIT: Appears to be directly related to https://github.com/bitnami/charts/issues/17015 The direction for upgrading postgresql-ha cannot be to scale to one replica if there is a known breaking issue with scaling to one replica.
Name and Version
bitnami/postgresql-ha 14.1.x
What architecture are you using?
amd64
What steps will reproduce the bug?
It seems there is a repmgr version 5.4 but there was no migration guide in the documentation. So when I upgraded from 14.0.3 to 14.0.13 or 14.1.2 I got the message
I used the migration instruction from 8.0.0 and set replicaCount to 1 and upgradeRepmgrExtension to true. Then I set the replicaCount to 3 again and it worked. So far so good. I have the same configuration on 2 more clusters and on those I could not get it to work:
You can see that it claims to upgrade repmgr
Upgrading repmgr extension
but in the end it tells meextension version 5.3 is installed but newer version 5.4 is available
I downgraded the installation and it works with the old version so at least nothing is broken for now. But upgrading again shows the same issue. So I can reproduce it in this namespace but when I create another namespace, install the chart version 14.0.3 and then upgrade with upgradeRepmgrExtension=true it upgrades correctly. So it is not fully reproducible.
Can anyone help me? And why is there even this migration in a patch release (I think it is 14.0.10) and no mention in the upgrading section?
What is the expected behavior?
postgresql-ha-postgresql-0 statefulset scaling up correctly
What do you see instead?
CrashLoopBackOff and message
extension version 5.3 is installed but newer version 5.4 is available