bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.94k stars 9.18k forks source link

[bitnami/postgresql-ha] postgresql-ha-postgresql-0 crashes after a while #29837

Open nightmare-rg opened 1 week ago

nightmare-rg commented 1 week ago

Name and Version

bitnami/postgresql-ha 14.2.33

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Set up K3S Cluster with terraform-hcloud-kube-hetzner with 4 large agent servers
NAME                               STATUS   ROLES                       AGE   VERSION
k3s-agent-large-aln          Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-fsn-hod      Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-fsn-vsw      Ready    <none>                      47h   v1.30.5+k3s1
k3s-agent-large-kdg          Ready    <none>                      47h   v1.30.5+k3s1
k3s-control-plane-fsn1-dfk   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-control-plane-hel1-lzp   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-control-plane-nbg1-zxy   Ready    control-plane,etcd,master   47h   v1.30.5+k3s1
k3s-egress-aai               Ready    <none>                      47h   v1.30.5+k3s1
  1. create namespace database and install bitnami/postgresql-ha

values.yml

global:
  storageClass: longhorn
  persistence:
    size: 25Gi

postgresql:
  image:
    tag: 14-debian-12
    debug: true
  replicaCount: 3
  maxConnections: 1000
  postgresConnectionLimit: 1000
  dbUserConnectionLimit: 1000

pgpool:
  replicaCount: 3
  maxPool: 20
  numInitChildren: 100
  childLifeTime: 300
  clientIdleLimit: 300
  clientIdleLimitInTransaction: 0
  reservedConnections: 0

I tried with default tag 16.4.0-debian-12-r22 and with the tag 14-debian-12

  1. Install Gitlab Helm Chart with DB Credentials:

values.yml

global:
  psql:
    host: postgresql-postgresql-ha-pgpool.database.svc.cluster.local
    username: postgres
    database: gitlabhq_production
    password:
      secret: psql-password
      key: password

gitlab-runner:
  install: false

postgresql:
  install: false

I removed some other options for better readability.

  1. Gitlab works fine but after some time postgresql-ha-0 node crashes. I remove this node and the cluster recovers, but after some time it crashes again.

What is the expected behavior?

Postgresql Cluster runs without crashes like my other Gitlab Installation with postgresql-ha-13.0.0 and 16.1.0 on v1.28.14+k3s1

What do you see instead?

LAST SEEN   TYPE      REASON      OBJECT                                                 MESSAGE
22m         Normal    Pulled      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Container image "docker.io/bitnami/pgpool:4.5.4-debian-12-r0" already present on machine
22m         Normal    Created     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Created container pgpool
22m         Normal    Started     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Started container pgpool
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: FATAL:  failed to create a backend 0 connection...
22m         Normal    Killing     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Container pgpool failed liveness probe, will be restarted
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:07.82 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: FATAL:  unable to read data from DB node 0...
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:15.62 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:25.48 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:35.62 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Liveness probe failed: ^[[38;5;6mpgpool ^[[38;5;5m09:32:45.52 ^[[0m^[[38;5;2mINFO ^[[0m ==> Checking pgpool health......
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: Connection refused...
21m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-46d5t   Readiness probe failed: psql: error: connection to server on socket "/opt/bitnami/pgpool/tmp/.s.PGSQL.5432" failed: server closed the connection unexpectedly...
21m         Normal    Pulled      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Container image "docker.io/bitnami/pgpool:4.5.4-debian-12-r0" already present on machine
21m         Normal    Created     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Created container pgpool
22m         Normal    Started     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Started container pgpool
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Readiness probe failed: command "bash -ec PGPASSWORD=${PGPOOL_POSTGRES_PASSWORD} psql -U \"postgres\" -d \"postgres\" -h /opt/bitnami/pgpool/tmp -tA -c \"SELECT 1\" >/dev/null" timed out
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Readiness probe failed:
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Liveness probe failed:
2m55s       Warning   BackOff     pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Back-off restarting failed container pgpool in pod postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886_database(09bba747-0ac4-4e38-9774-999bc1637f0d)
21m         Warning   Failed      pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-9n886   Error: failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: container init was OOM-killed (memory limit too low?): unknown
23m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-t44zh   Readiness probe failed: command "bash -ec PGPASSWORD=${PGPOOL_POSTGRES_PASSWORD} psql -U \"postgres\" -d \"postgres\" -h /opt/bitnami/pgpool/tmp -tA -c \"SELECT 1\" >/dev/null" timed out
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-pgpool-65bb9fc8b5-t44zh   Liveness probe failed: command "/opt/bitnami/scripts/pgpool/healthcheck.sh" timed out
21m         Normal    Pulled      pod/postgresql-postgresql-ha-postgresql-0              Container image "docker.io/bitnami/postgresql-repmgr:14-debian-12" already present on machine
21m         Normal    Created     pod/postgresql-postgresql-ha-postgresql-0              Created container postgresql
21m         Normal    Started     pod/postgresql-postgresql-ha-postgresql-0              Started container postgresql
2m59s       Warning   BackOff     pod/postgresql-postgresql-ha-postgresql-0              Back-off restarting failed container postgresql in pod postgresql-postgresql-ha-postgresql-0_database(0e58e555-9ae0-4a0e-a610-548f6833c800)
22m         Warning   Unhealthy   pod/postgresql-postgresql-ha-postgresql-0              Readiness probe failed: 127.0.0.1:5432 - rejecting connections
21m         Warning   Unhealthy   pod/postgresql-postgresql-ha-postgresql-0              Readiness probe failed: 127.0.0.1:5432 - no response

Bildschirmfoto 2024-10-09 um 11 56 15

database-postgresql-postgresql-ha-postgresql-1728467925773087000.log

In the log, I don't find any suitable information as to why the master node crashes.

Additional information

On my other Gitlab Setup, I had issues at the beginning problems with the connection limit to Postgresql. So I increased this value to 1000 and the setup ran really well for about 10 months. I used the same values and setup instructions for my new cluster only with the newer version and postgresql-ha is unstable with any load on my Gitlab Instance.

nightmare-rg commented 1 week ago

Today in the morning complete postgresql-ha cluster is down. Nothing happened in the night on the Gitlab Instance.

Bildschirmfoto 2024-10-10 um 07 30 03

Logs are only showing the init process and that the other nodes are not online.

Recovery steps:

Edit 11.10.2024:

I downgraded chart version to 13.0.0 with the exact same values.yml. Now the cluster is stable. So something with the new chart version cause the problem.

carrodher commented 6 days ago

Hi, the issue may not be directly related to the Bitnami container image/Helm chart, but rather to how the application is being utilized, configured in your specific environment, or tied to a particular scenario that is not easy to reproduce on our side.

If you think that's not the case and want to contribute a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

Suppose you have any questions about the application, customizing its content, or technology and infrastructure usage. In that case, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.