canonical / postgresql-k8s-operator

A Charmed Operator for running PostgreSQL on Kubernetes
https://charmhub.io/postgresql-k8s
Apache License 2.0
10 stars 20 forks source link

Charm enters error state after failed attempt to delete resources. #413

Closed moisesbenzan closed 7 months ago

moisesbenzan commented 7 months ago

Steps to reproduce

  1. Create a model on top of a K8s baremetal system
  2. Run juju deploy --channel 14/edge --model <model name> postgresql-k8s

Expected behavior

For the charm to enter an active idle state.

Actual behavior

Charm remains Installing agent

Model           Controller              Cloud/Region      Version  SLA          Timestamp
postgresql-k8s  foundations-kubernetes  kubernetes_cloud  3.1.7    unsupported  18:50:43

App             Version  Status   Scale  Charm           Channel  Rev  Address  Exposed  Message
postgresql-k8s           waiting    0/1  postgresql-k8s  14/edge  206           no       installing agent

Unit              Workload  Agent       Address  Ports  Message
postgresql-k8s/0  waiting   allocating                  installing agent

and, upon further log inspection, we can see that the charm enters an error state while attempting to delete resources within the cluster:

2024-03-05T18:51:22.051Z [container-agent] 2024-03-05 18:51:22 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook
2024-03-05T18:51:22.596Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2024-03-05T18:51:22.673Z [container-agent] 2024-03-05 18:51:22 ERROR juju-log Uncaught exception while in charm code:
2024-03-05T18:51:22.673Z [container-agent] Traceback (most recent call last):
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/lightkube/core/generic_client.py", line 188, in raise_for_status
2024-03-05T18:51:22.673Z [container-agent]     resp.raise_for_status()
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/httpx/_models.py", line 761, in raise_for_status
2024-03-05T18:51:22.673Z [container-agent]     raise HTTPStatusError(message, request=request, response=self)
2024-03-05T18:51:22.673Z [container-agent] httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://10.152.183.1/api/v1/namespaces/postgresql-k8s/services/patroni-postgresql-k8s'
2024-03-05T18:51:22.673Z [container-agent] For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403
2024-03-05T18:51:22.673Z [container-agent] 
2024-03-05T18:51:22.673Z [container-agent] During handling of the above exception, another exception occurred:
2024-03-05T18:51:22.673Z [container-agent] 
2024-03-05T18:51:22.673Z [container-agent] Traceback (most recent call last):
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/./src/charm.py", line 1585, in <module>
2024-03-05T18:51:22.673Z [container-agent]     main(PostgresqlOperatorCharm, use_juju_for_storage=True)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 456, in main
2024-03-05T18:51:22.673Z [container-agent]     _emit_charm_event(charm, dispatcher.event_name)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/main.py", line 144, in _emit_charm_event
2024-03-05T18:51:22.673Z [container-agent]     event_to_emit.emit(*args, **kwargs)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 351, in emit
2024-03-05T18:51:22.673Z [container-agent]     framework._emit(event)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 853, in _emit
2024-03-05T18:51:22.673Z [container-agent]     self._reemit(event_path)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/ops/framework.py", line 943, in _reemit
2024-03-05T18:51:22.673Z [container-agent]     custom_handler(event)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/./src/charm.py", line 605, in _on_leader_elected
2024-03-05T18:51:22.673Z [container-agent]     self._cleanup_old_cluster_resources()
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/./src/charm.py", line 864, in _cleanup_old_cluster_resources
2024-03-05T18:51:22.673Z [container-agent]     raise e
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/./src/charm.py", line 855, in _cleanup_old_cluster_resources
2024-03-05T18:51:22.673Z [container-agent]     client.delete(
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/lightkube/core/client.py", line 86, in delete
2024-03-05T18:51:22.673Z [container-agent]     return self._client.request("delete", res=res, name=name, namespace=namespace, params={
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/lightkube/core/generic_client.py", line 245, in request
2024-03-05T18:51:22.673Z [container-agent]     return self.handle_response(method, resp, br)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/lightkube/core/generic_client.py", line 196, in handle_response
2024-03-05T18:51:22.673Z [container-agent]     self.raise_for_status(resp)
2024-03-05T18:51:22.673Z [container-agent]   File "/var/lib/juju/agents/unit-postgresql-k8s-0/charm/venv/lightkube/core/generic_client.py", line 190, in raise_for_status
2024-03-05T18:51:22.673Z [container-agent]     raise transform_exception(e)
2024-03-05T18:51:22.673Z [container-agent] lightkube.core.exceptions.ApiError: services "patroni-postgresql-k8s" is forbidden: User "system:serviceaccount:postgresql-k8s:postgresql-k8s" cannot delete resource "s
ervices" in API group "" in the namespace "postgresql-k8s"
2024-03-05T18:51:22.944Z [container-agent] 2024-03-05 18:51:22 ERROR juju.worker.uniter.operation runhook.go:180 hook "leader-elected" (via hook dispatching script: dispatch) failed: exit status 1
2024-03-05T18:51:22.948Z [container-agent] 2024-03-05 18:51:22 INFO juju.worker.uniter resolver.go:161 awaiting error resolution for "leader-elected" hook

Versions

Operating system:

Juju CLI: 3.1.7

Juju agent: 3.1.7

Charm revision: 14/edge 206

kubernetes: 1.28

Log output

Juju debug log: postgresql-k8s-postgresql-k8s-0-charm.log

Additional context

Found on test run: https://solutions.qa.canonical.com/testruns/9e2b03fb-4405-4041-b4a7-8db1bb200ac6 More artifacts can be found linked in that page or can be requested to any member of the SQA team.

github-actions[bot] commented 7 months ago

https://warthogs.atlassian.net/browse/DPE-3719

moisesbenzan commented 7 months ago

Future Occurrences can be found here: https://solutions.qa.canonical.com/bugs/gh:canonical%2Fpostgresql-k8s-operator:413

dragomirp commented 7 months ago

Hi, @moisesbenzan, the charm needs to create resources to function, so needs to be deployed with the --trust flag.

taurus-forever commented 7 months ago

@moisesbenzan Resolved as the --trust flag is a documented requirement for K8s charm: https://charmhub.io/postgresql-k8s/docs/t-deploy-charm

Feel free to reopen if still topical. Tnx!