Closed stonepreston closed 1 year ago
I am experiencing the same issue on my local machine:
The charm can get back to active-idle if I delete the pod.
Loki is also stuck in crashloop backoff
Maybe related: https://github.com/canonical/operator/issues/847
This is a Juju error that should have been rolled back in a patch release. Could you please try again and verify whether that is the case? Seems to work in 2.9.37, as well as 3.1.
@simskij I have tried with a 2.9.37 controller in our VSphere environement. It does get passed the crash loop back off. But now the charm gets stuck here:
juju status
Model Controller Cloud/Region Version SLA Timestamp
stonepreston-cos stonepreston-vs stonepreston-vs-k8s-cloud/default 2.9.37 unsupported 14:53:33-06:00
App Version Status Scale Charm Channel Rev Address Exposed Message
grafana-k8s 9.2.1 waiting 1 grafana-k8s edge 52 10.152.183.99 no installing agent
Unit Workload Agent Address Ports Message
grafana-k8s/0* unknown idle 192.168.0.21
Juju debug log:
Grafana container log
Litestream container log:
kubectl logs grafana-k8s-0 -n stonepreston-cos -c litestream
2022-11-16T20:46:02.058Z [pebble] HTTP API server listening on ":38814".
2022-11-16T20:46:02.058Z [pebble] Started daemon.
I can close this issue and open a new one if youd like, since this seems to no longer be related to the crash loop?
@stonepreston I wonder if you hit a resource limit. Mind checking
microk8s kubectl get pods/grafana-k8s-0 -n stonepreston-cos -o=jsonpath='{.status}' | jq
@sed-i Here is the output of the status:
kubectl get pods/grafana-k8s-0 -n stonepreston-cos -o=jsonpath='{.status}' | jq
{
"conditions": [
{
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:46:01Z",
"status": "True",
"type": "Initialized"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:47:14Z",
"status": "True",
"type": "Ready"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:47:14Z",
"status": "True",
"type": "ContainersReady"
},
{
"lastProbeTime": null,
"lastTransitionTime": "2022-11-16T20:45:55Z",
"status": "True",
"type": "PodScheduled"
}
],
"containerStatuses": [
{
"containerID": "containerd://48fc9fc494fc9d09200015bc37e05c801eb3ddcd70ab043b777b30f87a3ef2fc",
"image": "rocks.canonical.com/cdk/jujusolutions/charm-base:ubuntu-20.04",
"imageID": "rocks.canonical.com/cdk/jujusolutions/charm-base@sha256:5ccefd1a92d63baa961680c22a47e01213c99e9c06280c732a1910a5c126f2d2",
"lastState": {},
"name": "charm",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:01Z"
}
}
},
{
"containerID": "containerd://666b62c11fc4d977cf52d8f9b405af40957a4633762d1c0d350999221a816145",
"image": "sha256:3f60358b5ba29becbfeb620dae8832f6bb93563a0fe83890f5c8c2c7e77f8e5f",
"imageID": "registry.jujucharms.com/charm/h71m6jk2jeap1qu5lv9nv5mplqayr91q34lqp/grafana-image@sha256:1a1d900ee938adeaaa167d4f7cd720129762e481c29eb8021d42d23a9332d506",
"lastState": {},
"name": "grafana",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:01Z"
}
}
},
{
"containerID": "containerd://0946f830795f62185b937efe2173ae974fde8aae5d16d582849ac377880b3f22",
"image": "sha256:810676c15a7137f5ade23a3f589ee683063723152bd9aa51d371356f3bce83db",
"imageID": "registry.jujucharms.com/charm/h71m6jk2jeap1qu5lv9nv5mplqayr91q34lqp/litestream-image@sha256:8ab4b042f6c84ec51cabd5a9caef7b5394080c88fa1d7c445f201780e39e8ea7",
"lastState": {},
"name": "litestream",
"ready": true,
"restartCount": 0,
"started": true,
"state": {
"running": {
"startedAt": "2022-11-16T20:46:02Z"
}
}
}
],
"hostIP": "10.246.154.193",
"initContainerStatuses": [
{
"containerID": "containerd://3e6232f94b4ebed1b9dae3bfc6e2a43b367a2dbd9657e6c110604005325e18f1",
"image": "rocks.canonical.com/cdk/jujusolutions/jujud-operator:2.9.37",
"imageID": "rocks.canonical.com/cdk/jujusolutions/jujud-operator@sha256:5a8797ceec40324721854ad7f96fdfdcd32a9b738b58c2c25e33dc81effde296",
"lastState": {},
"name": "charm-init",
"ready": true,
"restartCount": 0,
"state": {
"terminated": {
"containerID": "containerd://3e6232f94b4ebed1b9dae3bfc6e2a43b367a2dbd9657e6c110604005325e18f1",
"exitCode": 0,
"finishedAt": "2022-11-16T20:46:00Z",
"reason": "Completed",
"startedAt": "2022-11-16T20:46:00Z"
}
}
}
],
"phase": "Running",
"podIP": "192.168.0.21",
"podIPs": [
{
"ip": "192.168.0.21"
}
],
"qosClass": "Burstable",
"startTime": "2022-11-16T20:45:55Z"
}
Not a resource limit then. Thanks for checking.
Bug Description
The charm container gets stuck in a crash loop when deploying grafana-k8s using juju 2.9.35. The other containers (grafana and litestream) both have a ready status. This does not happen in microk8s, but does happen in Charmed-Kubernetes/Kubernetes core running in VSphere. I had previously opened issue where it was noted that when grafana is deployed the unit is rebooted/restarted, which seemed uncommon, so it might be some vsphere wonkyness at play.
I was able to deploy grafana on 2.9.34 as well as 2.9.33 so it seems related to 2.9.35 changes.I also deployed prometheus-k8s to see if this was an issue affecting other k8s charms, but prometheus did not seem to have problems and went active/idle after a minute or 2.
To Reproduce
juju model-defaults vsphere juju-http-proxy=http://squid.internal:3128 apt-http-proxy=http://squid.internal:3128 snap-http-proxy=http://squid.internal:3128 juju-https-proxy=http://squid.internal:3128 apt-https-proxy=http://squid.internal:3128 snap-https-proxy=http://squid.internal:3128 apt-no-proxy=localhost,127.0.0.1,ppa.launchpad.net,launchpad.net juju-no-proxy=localhost,127.0.0.1,0.0.0.0,ppa.launchpad.net,launchpad.net,10.0.8.0/24,10.246.154.0/24
juju add-model --config enable-os-refresh-update=false --config enable-os-upgrade=false --config logging-config='<root>=DEBUG' --config datastore=vsanDatastore --config primary-network=$YOUR_VLAN_HERE k8s-core vsphere/Boston
juju deploy kubernetes-core --overlay vsphere-overlay.yaml --trust --debug --channel edge
The overlay file yaml looks like this:juju scp kubernetes-control-plane/0:config ~/.kube/config
kubectl apply -f vsphere-storageclass.yaml
The storage class yaml looks like this:juju add-k8s $YOUR_K8S_CLOUD --controller $YOUR_CONTROLLER --storage mystorage
juju deploy grafana-k8s --channel edge --trust
Environment
Juju version being used in 2.9.35. Cloud being used to deploy charms into is the Boston vsphere cloud. Kubernetes 1.25 is being deployed as part of the edge kubernetes-core bundle (a slimmed down version of charmed-kubernetes).
As mentioned above this does not happen on 2.9.34 or 2.9.33 juju versions. It is isolated to the newly released 2.9.35.
Relevant log output
Additional context
No response