canonical / postgresql-k8s-operator

A Charmed Operator for running PostgreSQL on Kubernetes
https://charmhub.io/postgresql-k8s
Apache License 2.0
9 stars 19 forks source link

Missing `cluster-initialised` flag causes cluster to become stuck #634

Open AmberCharitos opened 4 weeks ago

AmberCharitos commented 4 weeks ago

Steps to reproduce

This was seen in production, the below is what happened in production. This does not necessarily mean it will reproduce the same bug locally as there may have been other factors at play.

  1. Deploy 3 units of postgresql-k8s charm channel 14/edge revision 198.
  2. Restart all pods the same time.

Expected behavior

All units come back to an active status

Actual behavior

All units enter either maintenance or waiting status. With the following, cluster not initialized log message.

Re-emitting deferred event <PebbleReadyEvent via PostgresqlOperatorCharm/on/postgresql_pebble_ready[6552765]>.
unit-postgresql-db-0: 08:13:00 DEBUG unit.postgresql-db/0.juju-log Deferring on_postgresql_pebble_ready: Not leader and cluster not initialized

Versions

Operating system: Ubuntu 22.04.4 LTS

Juju CLI: 3.5.3-ubuntu-amd64

Juju agent: 3.1.8

Charm revision: 198 14/edge

kubectl: Client Version: v1.30.3 Server Version: v1.26.15

Log output

Juju debug log:

postgresql-0.log postgresql-1.log postgresql-2.log

Additional context

[Matrix conversation for context] (https://matrix.to/#/!BukWfnyOTgQSKAxdtT:ubuntu.com/$jrGfGfW07m4i6t0dKcKAElmZmEznSvDlh1ruRTerFgI?via=ubuntu.com&via=matrix.org&via=laquadrature.net)

github-actions[bot] commented 4 weeks ago

https://warthogs.atlassian.net/browse/DPE-5165

marceloneppel commented 3 weeks ago

Thanks for the bug report, @AmberCharitos!

Just to not forget, the flag is removed at https://github.com/canonical/postgresql-k8s-operator/blob/e760a3df8ceceab49357ff57c620299d9d101fc8/src/charm.py#L841.