canonical / loki-k8s-operator

https://charmhub.io/loki-k8s
Apache License 2.0
10 stars 16 forks source link

Loki unit fails with "logging-relation-departed" after integrating with grafana-agent #426

Closed anna-savchenko closed 2 months ago

anna-savchenko commented 5 months ago

Bug Description

After integrating grafana-agent in Charmed Kubernetes with Loki, loki breaks with hook failed: "logging-relation-departed"

To Reproduce

  1. deploy charmed kubernetes with grafana-agent
  2. deploy COS bundle and add offers
  3. add these relations:
    
    juju relate grafana-agent etcd
    juju relate grafana-agent kubernetes-control-plane
    juju relate grafana-agent kubernetes-worker

juju consume foundations-openstack:admin/cos.grafana cos-grafana juju consume foundations-openstack:admin/cos.loki cos-loki juju consume foundations-openstack:admin/cos.prometheus cos-prometheus

juju relate cos-loki:logging grafana-agent:logging-consumer juju relate cos-prometheus:receive-remote-write grafana-agent:send-remote-write juju relate cos-grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider


### Environment

Charm revisions:
alertmanager-k8s_latest_stable_r113
grafana-agent_latest_stable_r95 
loki-k8s_latest_stable_r136   
traefik-k8s_latest_stable_r191
catalogue-k8s_latest_stable_r38
grafana-k8s_latest_stable_r113
prometheus-k8s_latest_stable_r189

Loki image: loki-2.9.5-22.04

Juju: 3.5.0

Charmed K8s - 1.29
COS Microk8s - 1.28

### Relevant log output

```shell
2024-06-19T08:56:20.988Z [container-agent] 2024-06-19 08:56:20 ERROR juju.worker.uniter.operation runhook.go:180 hook "logging-relation-departed" (via hook dispatching script: dispatch) failed: exit status 1
2024-06-19T08:56:20.989Z [container-agent] 2024-06-19 08:56:20 INFO juju.worker.uniter resolver.go:180 awaiting error resolution for "relation-departed" hook
2024-06-19T09:01:00.710Z [container-agent] 2024-06-19 09:01:00 INFO juju.worker.uniter resolver.go:180 awaiting error resolution for "relation-departed" hook
2024-06-19T09:01:20.998Z [container-agent] 2024-06-19 09:01:20 INFO juju.worker.uniter resolver.go:180 awaiting error resolution for "relation-departed" hook
2024-06-19T09:01:23.764Z [container-agent] 2024-06-19 09:01:23 WARNING logging-relation-departed rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
2024-06-19T09:01:23.939Z [container-agent] 2024-06-19 09:01:23 ERROR juju-log logging:27: Uncaught exception while in charm code:
2024-06-19T09:01:23.939Z [container-agent] Traceback (most recent call last):
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 3022, in _run
2024-06-19T09:01:23.939Z [container-agent]     result = subprocess.run(args, **kwargs)  # type: ignore
2024-06-19T09:01:23.939Z [container-agent]   File "/usr/lib/python3.8/subprocess.py", line 516, in run
2024-06-19T09:01:23.939Z [container-agent]     raise CalledProcessError(retcode, process.args,
2024-06-19T09:01:23.939Z [container-agent] subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-loki-0/relation-get', '-r', '27', '-', 'remote-dc40984f276d4e2786aa1a1f23243697', '--app', '--format=json')' returned non-zero exit status 1.
2024-06-19T09:01:23.939Z [container-agent] 
2024-06-19T09:01:23.939Z [container-agent] The above exception was the direct cause of the following exception:
2024-06-19T09:01:23.939Z [container-agent] 
2024-06-19T09:01:23.939Z [container-agent] Traceback (most recent call last):
2024-06-19T09:01:23.939Z [container-agent]   File "./src/charm.py", line 671, in <module>
2024-06-19T09:01:23.939Z [container-agent]     main(LokiOperatorCharm, use_juju_for_storage=True)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 544, in main
2024-06-19T09:01:23.939Z [container-agent]     manager.run()
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 520, in run
2024-06-19T09:01:23.939Z [container-agent]     self._emit()
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 509, in _emit
2024-06-19T09:01:23.939Z [container-agent]     _emit_charm_event(self.charm, self.dispatcher.event_name)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
2024-06-19T09:01:23.939Z [container-agent]     event_to_emit.emit(*args, **kwargs)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 352, in emit
2024-06-19T09:01:23.939Z [container-agent]     framework._emit(event)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 851, in _emit
2024-06-19T09:01:23.939Z [container-agent]     self._reemit(event_path)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 941, in _reemit
2024-06-19T09:01:23.939Z [container-agent]     custom_handler(event)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
2024-06-19T09:01:23.939Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1185, in _on_logging_relation_departed
2024-06-19T09:01:23.939Z [container-agent]     self.on.loki_push_api_alert_rules_changed.emit(
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 352, in emit
2024-06-19T09:01:23.939Z [container-agent]     framework._emit(event)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 851, in _emit
2024-06-19T09:01:23.939Z [container-agent]     self._reemit(event_path)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/framework.py", line 941, in _reemit
2024-06-19T09:01:23.939Z [container-agent]     custom_handler(event)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
2024-06-19T09:01:23.939Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-06-19T09:01:23.939Z [container-agent]   File "./src/charm.py", line 524, in _loki_push_api_alert_rules_changed
2024-06-19T09:01:23.939Z [container-agent]     self._regenerate_alert_rules()
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 547, in wrapped_function
2024-06-19T09:01:23.939Z [container-agent]     return callable(*args, **kwargs)  # type: ignore
2024-06-19T09:01:23.939Z [container-agent]   File "./src/charm.py", line 550, in _regenerate_alert_rules
2024-06-19T09:01:23.939Z [container-agent]     if self.loki_provider.alerts:
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/loki_k8s/v0/loki_push_api.py", line 1323, in alerts
2024-06-19T09:01:23.939Z [container-agent]     alert_rules = json.loads(relation.data[relation.app].get("alert_rules", "{}"))
2024-06-19T09:01:23.939Z [container-agent]   File "/usr/lib/python3.8/_collections_abc.py", line 660, in get
2024-06-19T09:01:23.939Z [container-agent]     return self[key]
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 1705, in __getitem__
2024-06-19T09:01:23.939Z [container-agent]     return super().__getitem__(key)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 806, in __getitem__
2024-06-19T09:01:23.939Z [container-agent]     return self._data[key]
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 790, in _data
2024-06-19T09:01:23.939Z [container-agent]     data = self._lazy_data = self._load()
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 1589, in _load
2024-06-19T09:01:23.939Z [container-agent]     return self._backend.relation_get(self.relation.id, self._entity.name, self._is_app)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 3095, in relation_get
2024-06-19T09:01:23.939Z [container-agent]     raw_data_content = self._run(*args, return_output=True, use_json=True)
2024-06-19T09:01:23.939Z [container-agent]   File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/model.py", line 3024, in _run
2024-06-19T09:01:23.939Z [container-agent]     raise ModelError(e.stderr) from e
2024-06-19T09:01:23.939Z [container-agent] ops.model.ModelError: ERROR permission denied
2024-06-19T09:01:23.939Z [container-agent] 
2024-06-19T09:01:24.202Z [container-agent] 2024-06-19 09:01:24 ERROR juju.worker.uniter.operation runhook.go:180 hook "logging-relation-departed" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

No response

anna-savchenko commented 5 months ago

Update

I redeployed COS and tried to add the integrations in the following order:

juju relate cos-grafana:grafana-dashboard grafana-agent:grafana-dashboards-provider wait for grafana status to change from executing to active juju relate cos-prometheus:receive-remote-write grafana-agent:send-remote-write wait for prometheus status to change from executing to active juju relate cos-loki:logging grafana-agent:logging-consumer

This approach worked but I'm not sure if it's actually related to the issue

lucabello commented 3 months ago

This was most likely a queued event from a removal executed with --force. Can you confirm?

lucabello commented 2 months ago

Closing because I believe that was the issue, but pleas feel free to reopen if not! :)