canonical / postgresql-k8s-operator

A Charmed Operator for running PostgreSQL on Kubernetes
https://charmhub.io/postgresql-k8s
Apache License 2.0
10 stars 20 forks source link

Postgresql-k8s unit goes to blocked state: `Cannot disable plugins: Existing objects depend on it` #701

Open natalian98 opened 2 months ago

natalian98 commented 2 months ago

Steps to reproduce

This only happens once per couple of test runs, hence is difficult to reproduce, but you can do so by running:

git clone https://github.com/canonical/iam-bundle.git
tox -e integration -- --keep-models

Expected behavior

Postgresql-k8s app and unit get active.

Actual behavior

At times postgresql-k8s unit gets stuck in blocked state, causing our bundle tests to fail. These are runs from this week: https://github.com/canonical/iam-bundle/actions/runs/10880493814/job/30244416338#step:4:667 https://github.com/canonical/iam-bundle/actions/runs/10937282732/job/30362792991 https://github.com/canonical/iam-bundle/actions/runs/10900084647/job/30246933947

postgresql-k8s/0 [idle] blocked: Cannot disable plugins: Existing objects depend on it. See logs

However, we don't enable or disable any plugins in the charms integrated with the database (kratos and hydra). Could you advise what could be causing this?

Versions

Operating system: ubuntu 22.04

Juju CLI: 3.4/stable

Juju agent: 3.4.5

Charm revision: 381

microk8s: 1.27 and 1.28/stable

Log output

Juju debug log: https://github.com/canonical/iam-bundle/actions/runs/10880493814/job/30244416338#step:14:1

Additional context

We've been deploying postgresql-k8s from 14/stable channel. So far the tests run successfully when it pointed to rev281, we're experiencing this flaky issue since it was promoted to rev381.

syncronize-issues-to-jira[bot] commented 2 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5494.

This message was autogenerated

dragomirp commented 2 months ago

Hi, @natalian98, the blocked status should be caused by:

2024-09-19T09:13:43.4942715Z unit-postgresql-k8s-0: 09:13:33 ERROR unit.postgresql-k8s/0.juju-log Failed to disable plugin: cannot drop extension pg_trgm because other objects depend on it
2024-09-19T09:13:43.4943285Z DETAIL:  index identity_credential_identifiers_nid_identifier_gin depends on operator class gin_trgm_ops for access method gin
2024-09-19T09:13:43.4943489Z HINT:  Use DROP ... CASCADE to drop the dependent objects too.
2024-09-19T09:13:43.4943496Z 
2024-09-19T09:13:43.4944042Z Was the plugin enabled manually? If so, update charm config with `juju config postgresql-k8s plugin_<plugin_name>_enable=True`
2024-09-19T09:13:43.4944615Z unit-postgresql-k8s-0: 09:13:33 DEBUG unit.postgresql-k8s/0.juju-log on_update_status early exit: Unit is in Blocked/Waiting status

Do you know if any of your components manually enables pg_trgm? Have you tried setting the plugin_pg_trgm_enable config in the bundle?

natalian98 commented 2 months ago

Hi @dragomirp, thanks for your fast reply.

The charms don't enable pg_trgm, but the upstream kratos component creates the extension on db migration, if I understand correctly. That doesn't explain though why this only happens on some runs? If that was the issue, the tests would fail consistently because the database is always migrated on a fresh deployment. Could there be some racing condition in setting the unit status in postgresql-k8s?

dragomirp commented 2 months ago

Hi, this check should be happening in the update status hook, so I would guess that sometimes the test manages to exit before the Postgresql charm manages to block.

Can you try to enable the plugin in the bundle and see if the issue persists?

natalian98 commented 2 months ago

@dragomirp I tried enabling the plugin and one of two runs failed again: https://github.com/canonical/iam-bundle/actions/runs/10941912483/job/30377673936#step:4:681

dragomirp commented 2 months ago

Hi, @natalian98, looks like there are more plugins required:

2024-09-19T14:08:17.8743192Z Was the plugin enabled manually? If so, update charm config with `juju config postgresql-k8s plugin_<plugin_name>_enable=True`
2024-09-19T14:08:17.8745467Z unit-postgresql-k8s-0: 13:39:43 ERROR unit.postgresql-k8s/0.juju-log Failed to disable plugin: cannot drop extension btree_gin because other objects depend on it
2024-09-19T14:08:17.8747604Z DETAIL:  index identity_credential_identifiers_nid_identifier_gin depends on operator class uuid_ops for access method gin

This should be enabled by plugin_btree_gin_enable flag.

You can check for missing plugins in the debug log step of the run: https://github.com/canonical/iam-bundle/actions/runs/10941912483/job/30377673936#step:14:4694

natalian98 commented 2 months ago

Hi @dragomirp, that solved the issue, thanks a lot! Suggestion: perhaps the status could be set on a different event than update-status? Some teams set this hook interval to 1h in tests, so they may not find out that some plugin is missing

dragomirp commented 2 months ago

Glad it worked out.

I'll discuss it with the rest of the team, but I don't think there is a more appropriate event, since we can't know when extensions are enabled manually. Polling periodically on update-status seems to be the most concise way to verify there's no mismatch between declared plugins and usage.