Closed stonepreston closed 1 year ago
I am experiencing the same issue on my local machine:
The charm can get back to active-idle if I delete the pod.
Loki is also stuck in crashloop backoff
Maybe related:
This is a Juju error that should have been rolled back in a patch release. Could you please try again and verify whether that is the case? Seems to work in 2.9.37, as well as 3.1.
@simskij I have tried with a 2.9.37 controller in our VSphere environement. It does get passed the crash loop back off. But now the charm gets stuck here:
juju status
Model Controller Cloud/Region Version SLA Timestamp
stonepreston-cos stonepreston-vs stonepreston-vs-k8s-cloud/default 2.9.37 unsupported 14:53:33-06:00
App Version Status Scale Charm Channel Rev Address Exposed Message
grafana-k8s 9.2.1 waiting 1 grafana-k8s edge 52 no installing agent
Unit Workload Agent Address Ports Message
grafana-k8s/0* unknown idle
Juju debug log:
Grafana container log
Litestream container log:
kubectl logs grafana-k8s-0 -n stonepreston-cos -c litestream
2022-11-16T20:46:02.058Z [pebble] HTTP API server listening on ":38814".
2022-11-16T20:46:02.058Z [pebble] Started daemon.
I can close this issue and open a new one if youd like, since this seems to no longer be related to the crash loop?
@stonepreston I wonder if you hit a resource limit. Mind checking
microk8s kubectl get pods/grafana-k8s-0 -n stonepreston-cos -o=jsonpath='{.status}' | jq
@sed-i Here is the output of the status:
kubectl get pods/grafana-k8s-0 -n stonepreston-cos -o=jsonpath='{.status}' | jq
"conditions": [
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:46:01Z",
"status": "True",
"type": "Initialized"
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:47:14Z",
"status": "True",
"type": "Ready"
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:47:14Z",
"status": "True",
"type": "ContainersReady"
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:45:55Z",
"status": "True",
"type": "PodScheduled"
"containerStatuses": [
"containerID": "containerd://48fc9fc494fc9d09200015bc37e05c801eb3ddcd70ab043b777b30f87a3ef2fc",
"image": "",
"imageID": "",
"lastState": {},
"name": "charm",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:01Z"
"containerID": "containerd://666b62c11fc4d977cf52d8f9b405af40957a4633762d1c0d350999221a816145",
"image": "sha256:3f60358b5ba29becbfeb620dae8832f6bb93563a0fe83890f5c8c2c7e77f8e5f",
"imageID": "",
"lastState": {},
"name": "grafana",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:01Z"
"containerID": "containerd://0946f830795f62185b937efe2173ae974fde8aae5d16d582849ac377880b3f22",
"image": "sha256:810676c15a7137f5ade23a3f589ee683063723152bd9aa51d371356f3bce83db",
"imageID": "",
"lastState": {},
"name": "litestream",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:02Z"
"hostIP": "",
"initContainerStatuses": [
"containerID": "containerd://3e6232f94b4ebed1b9dae3bfc6e2a43b367a2dbd9657e6c110604005325e18f1",
"image": "",
"imageID": "",
"lastState": {},
"name": "charm-init",
"ready": true,
"restartCount": 0,
"state": {
"terminated": {
"containerID": "containerd://3e6232f94b4ebed1b9dae3bfc6e2a43b367a2dbd9657e6c110604005325e18f1",
"exitCode": 0,
"finishedAt": "2022-11-16T20:46:00Z",
"reason": "Completed",
"startedAt": "2022-11-16T20:46:00Z"
"phase": "Running",
"podIP": "",
"podIPs": [
"ip": ""
"qosClass": "Burstable",
"startTime": "2022-11-16T20:45:55Z"
Not a resource limit then. Thanks for checking.
Bug Description
The charm container gets stuck in a crash loop when deploying grafana-k8s using juju 2.9.35. The other containers (grafana and litestream) both have a ready status. This does not happen in microk8s, but does happen in Charmed-Kubernetes/Kubernetes core running in VSphere. I had previously opened issue where it was noted that when grafana is deployed the unit is rebooted/restarted, which seemed uncommon, so it might be some vsphere wonkyness at play.
I was able to deploy grafana on 2.9.34 as well as 2.9.33 so it seems related to 2.9.35 changes.I also deployed prometheus-k8s to see if this was an issue affecting other k8s charms, but prometheus did not seem to have problems and went active/idle after a minute or 2.
To Reproduce
juju model-defaults vsphere juju-http-proxy=http://squid.internal:3128 apt-http-proxy=http://squid.internal:3128 snap-http-proxy=http://squid.internal:3128 juju-https-proxy=http://squid.internal:3128 apt-https-proxy=http://squid.internal:3128 snap-https-proxy=http://squid.internal:3128 apt-no-proxy=localhost,,, juju-no-proxy=localhost,,,,,,
juju add-model --config enable-os-refresh-update=false --config enable-os-upgrade=false --config logging-config='<root>=DEBUG' --config datastore=vsanDatastore --config primary-network=$YOUR_VLAN_HERE k8s-core vsphere/Boston
juju deploy kubernetes-core --overlay vsphere-overlay.yaml --trust --debug --channel edge
The overlay file yaml looks like this:juju scp kubernetes-control-plane/0:config ~/.kube/config
kubectl apply -f vsphere-storageclass.yaml
The storage class yaml looks like this:juju add-k8s $YOUR_K8S_CLOUD --controller $YOUR_CONTROLLER --storage mystorage
juju deploy grafana-k8s --channel edge --trust
Juju version being used in 2.9.35. Cloud being used to deploy charms into is the Boston vsphere cloud. Kubernetes 1.25 is being deployed as part of the edge kubernetes-core bundle (a slimmed down version of charmed-kubernetes).
As mentioned above this does not happen on 2.9.34 or 2.9.33 juju versions. It is isolated to the newly released 2.9.35.
Relevant log output
Additional context
No response