Closed asbalderson closed 2 months ago
First time we see this. We'll go and try to reproduce.
seems related - https://github.com/canonical/operator/issues/1246
We should try to reproduce again, now that the linked PR landed.
@benhoyt This is where the error comes from.
Do you know if the canonical/operator#1247 fixes it?
Yes, it sounds like exactly the same issue -- it should do!
@asbalderson we believe this is fixed now! Closing, but feel free to reopen if needed :)
Bug Description
On a deployment of COS stable with a 3 node microk8s and microceph back-end. The grafana unit stayed stuck executing for over 4 hours, bumping into our timeout for hanging. The unit keeps executing config changed events for the whole duration, but never resolves if it is executing or not. The grafana logs show that it still runs a handful of commands but then stops with a
Exec 19: timeout waiting for websocket connections: context deadline exceeded
and never runs anything else over the 4 hour period.Nothing else stands out in the bugs description.
To Reproduce
Environment
Running on: microk8s - v1.28.3 microceph (snap) - latest/edge cos-lite - latest/stable:11 metallb - v0.13.10
Relevant log output
Additional context
crashdump can be found at: https://oil-jenkins.canonical.com/artifacts/39937174-e6c2-401b-940d-64cb8655ef02/generated/generated/microk8s/juju-crashdump-microk8s-2024-01-16-07.40.28.tar.gz