canonical / postgresql-k8s-operator

A Charmed Operator for running PostgreSQL on Kubernetes
https://charmhub.io/postgresql-k8s
Apache License 2.0
10 stars 20 forks source link

Scaling breaks "The S3 repository has backups from another cluster" blocked status #591

Closed Zvirovyi closed 2 weeks ago

Zvirovyi commented 3 months ago

Steps to reproduce

  1. juju deploy s3-integrator
  2. juju deploy postgresql-k8s --config profile=testing --channel edge --trust
  3. configure s3-integrator to use bucket with backups from another cluster
  4. juju integrate s3-integrator postgresql-k8s
  5. as it should be, cluster became in The S3 repository has backups from another cluster blocked status Screenshot
  6. juju scale-application postgresql-k8s 2
  7. after scale, cluster loose it's blocked status and became active idle Screenshot

Expected behavior

Cluster should keep it's blocked status The S3 repository has backups from another cluster even after scaling.

Actual behavior

See step 7.

Versions

Operating system: Ubuntu 24.04 LTS

Juju CLI: 3.5.2-genericlinux-amd64

Juju agent: 3.5.2

Charm revision: 327

microk8s: MicroK8s v1.30.0 revision 6783

Log output

Juju debug log: debug-log

Additional context

From the work on Point In Time Recovery, i can suggest to refactor logic of this blocked message. Now, The S3 repository has backups from another cluster message sets in PostgreSQLBackups._on_s3_credential_changed by can_use_s3_repository func and it's supposed not to be overridden by any of events. But it's better to keep this status in app_peer_data and set this message in PostgresqlOperatorCharm._set_active_status function as it works with blocked status Move restored cluster to another S3 bucket introduced by the PITR work in the PR referenced above.

github-actions[bot] commented 3 months ago

https://warthogs.atlassian.net/browse/DPE-4980

taurus-forever commented 3 months ago

This is a valid bugreport and lets keep it opened, however we should focus on removal this blocking message completely (managing timelines during recovery) and not fixing this issue. @marceloneppel ACK?

Zvirovyi commented 3 months ago

@taurus-forever

Timelines management will:

But, error The S3 repository has backups from another cluster is different as it indicates that pgBackRest stanza is configured for another cluster. From the quick research, pgBackRest does not support such scenario. Also, it may cause some errors like same timelines from different clusters will conflict with each other. Anyway, this will require deeper research and it's not related to the PITR.

marceloneppel commented 3 months ago

@taurus-forever: @Zvirovyi is right. The message that is gone after scaling the cluster is related to pointing the cluster to a backup repository from another cluster, not from a different timeline of the same cluster.