Closed sed-i closed 1 year ago
@sed-i I can't reproduce this, and the log doesn't make any sense. Was this done with --force
? The error is actually grafana-source
also, but it somehow looks like there is no peer data relation at all, and the peer data bag should absolutely not be getting cleaned up while the charm still exists.
If you can reproduce this, can you capture the events and/or the whole log? This doesn't provide enough information other than a guess that the peer relation was somehow broken/departed before the other events fired.
Can't reproduce on 2cpu7gb + juju 2.9.29. Closing.
This happened again with grafana rev. 53 on 4cpu-8gb.
juju remove-application --destroy-storage grafana
unit-grafana-0: 18:46:31 DEBUG unit.grafana/0.juju-log grafana-source:9: Emitting Juju event grafana_source_relation_departed.
unit-grafana-0: 18:46:31 DEBUG unit.grafana/0.juju-log grafana-source:9: Removing all data for relation: 9
unit-grafana-0: 18:46:31 ERROR unit.grafana/0.juju-log grafana-source:9: Uncaught exception while in charm code:
Traceback (most recent call last):
File "./src/charm.py", line 1163, in <module>
main(GrafanaCharm, use_juju_for_storage=True)
File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/main.py", line 438, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/main.py", line 150, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/framework.py", line 355, in emit
framework._emit(event) # noqa
File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/framework.py", line 856, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/framework.py", line 931, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-grafana-0/charm/lib/charms/grafana_k8s/v0/grafana_source.py", line 608, in _on_grafana_source_relation_departed
removed_source = self._remove_source_from_datastore(event)
File "/var/lib/juju/agents/unit-grafana-0/charm/lib/charms/grafana_k8s/v0/grafana_source.py", line 623, in _remove_source_from_datastore
stored_sources = self.get_peer_data("sources")
File "/var/lib/juju/agents/unit-grafana-0/charm/lib/charms/grafana_k8s/v0/grafana_source.py", line 722, in get_peer_data
data = self._charm.peers.data[self._charm.app].get(key, "") # type: ignore[attr-defined]
AttributeError: 'NoneType' object has no attribute 'data'
unit-grafana-0: 18:46:32 ERROR juju.worker.uniter.operation hook "grafana-source-relation-departed" (via hook dispatching script: dispatch) failed: exit status 1
This is exactly the same as the last one. This traceback is insufficient, and it still looks like a Juju bug. The peer relation should never be done. I still can't reproduce this. If you can, please capture the state of the model/application (including all relations, including peer relations) and submit it.
So juju status is stuck on 0/1 for all apps
Model Controller Cloud/Region Version SLA Timestamp Notes
m8 chdv2934 microk8s/localhost 2.9.34 unsupported 13:16:25-05:00 attempt 13 to destroy model failed (will retry): model not empty, found 5 applications (model not empty)
App Version Status Scale Charm Channel Rev Address Exposed Message
catalogue active 0/1 catalogue-k8s edge 4 10.152.183.88 no
grafana 9.2.1 terminated 0/1 grafana-k8s 0 10.152.183.62 no unit stopped by the cloud
loki unknown 0/1 loki-k8s edge 47 10.152.183.212 no
prometheus unknown 0/1 prometheus-k8s 0 10.152.183.19 no
traefik unknown 0/1 traefik-k8s edge 93 192.168.1.10 no
Unit Workload Agent Address Ports Message
grafana/0 unknown lost 10.1.55.13 agent lost, see 'juju show-status-log grafana/0'
Relation provider Requirer Interface Type Message
catalogue:catalogue grafana:catalogue catalogue regular
grafana:metrics-endpoint prometheus:metrics-endpoint prometheus_scrape regular
loki:grafana-source grafana:grafana-source grafana_datasource regular
prometheus:grafana-dashboard grafana:grafana-dashboard grafana_dashboard regular
prometheus:grafana-source grafana:grafana-source grafana_datasource regular
traefik:traefik-route grafana:ingress traefik_route regular
but there's nothing left:
$ k get pods -n m8
NAME READY STATUS RESTARTS AGE
modeloperator-695c98c5f8-t22ps 1/1 Running 0 59m
When I forcefully remove grafana, it all unlocks clears out.
Ok. But can you please check the requested information? This shows a missing peer relation in at the bottom, but can you get the raw model data, and app data for Grafana?
Collected show-application
, show-unit
and status
, before and after running destroy-model --destroy-storage
.
status.zip
Before:
{
"grafana": {
"charm": "local:focal/grafana-k8s-0",
"series": "kubernetes",
"os": "kubernetes",
"charm-origin": "local",
"charm-name": "grafana-k8s",
"charm-rev": 0,
"scale": 1,
"provider-id": "49331306-9bd9-422a-9792-3c0203e449ba",
"address": "10.152.183.127",
"exposed": false,
"application-status": {
"current": "active",
"since": "23 Nov 2022 13:26:33-05:00"
},
"relations": {
"catalogue": [
"catalogue"
],
"grafana": [ # <---- HERE
"grafana"
],
"grafana-dashboard": [
"alertmanager",
"loki",
"prometheus"
],
"grafana-source": [
"alertmanager",
"loki",
"prometheus"
],
"ingress": [
"traefik"
],
"metrics-endpoint": [
"prometheus"
]
},
# ...
}
After:
"grafana": {
"charm": "local:focal/grafana-k8s-0",
"series": "kubernetes",
"os": "kubernetes",
"charm-origin": "local",
"charm-name": "grafana-k8s",
"charm-rev": 0,
"scale": 1,
"provider-id": "49331306-9bd9-422a-9792-3c0203e449ba",
"address": "10.152.183.127",
"exposed": false,
"life": "dying",
"application-status": {
"current": "error",
"message": "hook failed: \"grafana-dashboard-relation-broken\"",
"since": "23 Nov 2022 16:39:29-05:00"
},
"relations": {
"catalogue": [
"catalogue"
],
"grafana-dashboard": [
"alertmanager",
"loki",
"prometheus"
],
"grafana-source": [
"alertmanager",
"loki",
"prometheus"
],
"ingress": [
"traefik"
],
"metrics-endpoint": [
"prometheus"
]
},
...
There is, in fact, no peer relation at all. This should never happen. This is not a Grafana bug. Let's take this to Launchpad/Juju, because either the contract changed, or there is a bug there, because "the charm departed the peer relation before everything else" did not happen for 9 months prior to this, there haven't been any changelogs about it, and it violates fundamental assumptions.
Posted here: https://bugs.launchpad.net/juju/+bug/1998282
Bug Description
When I destroy the COS Lite model, grafana goes into error state:
To Reproduce
juju destroy-model --destroy-storage
.Environment
NTA.
Relevant log output
Additional context
No response