canonical / grafana-k8s-operator

https://charmhub.io/grafana-k8s
Apache License 2.0
6 stars 22 forks source link

`ingress-relation-broken` hook fails #165

Closed barrettj12 closed 1 year ago

barrettj12 commented 1 year ago

Bug Description

When you deploy COS Lite and try to remove traefik, the ingress-relation-broken hook on grafana fails.

To Reproduce

juju add-model cos juju deploy cos-lite juju remove-application traefik

Environment

Juju 3.0 running on microk8s

$ juju version --all
version: 3.0.3-ubuntu-amd64
git-commit: 10fda27abdb2682beef8e47f87de77251cb16864
$ microk8s version
MicroK8s v1.25.4 revision 4221

Relevant log output

unit-grafana-0: 13:58:50 ERROR unit.grafana/0.juju-log ingress:5: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 1156, in <module>
    main(GrafanaCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/main.py", line 426, in main
    charm = charm_class(framework)
  File "./src/charm.py", line 210, in __init__
    url=self.external_url,
  File "./src/charm.py", line 1068, in external_url
    if self.ingress.external_host:
  File "/var/lib/juju/agents/unit-grafana-0/charm/lib/charms/traefik_route_k8s/v0/traefik_route.py", line 220, in external_host
    self._update_stored_external_host()
  File "/var/lib/juju/agents/unit-grafana-0/charm/lib/charms/traefik_route_k8s/v0/traefik_route.py", line 235, in _update_stored_external_host
    external_host = relation.data[relation.app].get("external_host", "")
  File "/var/lib/juju/agents/unit-grafana-0/charm/venv/ops/model.py", line 914, in __getitem__
    raise KeyError(
KeyError: 'Cannot index relation data with "None". Are you trying to access remote app data during a relation-broken event? This is not allowed.'
unit-grafana-0: 13:58:50 ERROR juju.worker.uniter.operation hook "ingress-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1

Additional context

No response

sed-i commented 1 year ago

Should be fixed in https://github.com/canonical/traefik-route-k8s-operator/pull/17. Are you using stable by any chance?

barrettj12 commented 1 year ago

@sed-i just tried with cos-lite edge. The hook is still failing but it looks like a different reason now.

$ juju debug-log
unit-loki-0: 14:53:15 ERROR unit.loki/0.juju-log ingress:4: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 524, in <module>
    main(LokiOperatorCharm, use_juju_for_storage=True)
  File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 426, in main
    charm = charm_class(framework)
  File "./src/charm.py", line 100, in __init__
    source_url=self._external_url,
  File "./src/charm.py", line 216, in _external_url
    if ingress_url := self.ingress_per_unit.url:
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/traefik_k8s/v1/ingress_per_unit.py", line 843, in url
    urls = self.urls
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/traefik_k8s/v1/ingress_per_unit.py", line 834, in urls
    current_urls = self._urls_from_relation_data
  File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/traefik_k8s/v1/ingress_per_unit.py", line 804, in _urls_from_relation_data
    if not all((relation.app, relation.app.name)):  # type: ignore
AttributeError: 'NoneType' object has no attribute 'name'
unit-loki-0: 14:53:16 ERROR juju.worker.uniter.operation hook "ingress-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
sed-i commented 1 year ago

Thanks @barrettj12.

It's not a generator, it's a tuple... The guard should probably be changed to (the equivalent of):

-if not all((relation.app, relation.app.name)):
+if not relation.app or not relation.app.name:

Any other considerations @rbarry82 ?

rbarry82 commented 1 year ago

Yes, because this isn't from Grafana at all (it's IPU in Loki), and it's not really important whether it's a generator or not for this case, because it's the same "Relation.app is Optional now" issue which you actually wrote a patch for last summer @sed-i. What is important is that unless it's an iterator, Python will try to evaluate relation.app.name (and find an AttributeError) before any() can short-circuit on relation.app being None in the first place.

Considering that any permutation which even tries to create a generator from this will inevitably yield to Python actually trying to evaluate it (and hitting the same exception) without a snarly lambda being shoved into the middle, or nesting a comprehension inside it relying on the fact that any possible reader will know that any() short-circuits and that coercing it into a comprehension with a generator will actually lazy-evaluate it (so any change away from this for "clarify" will cause a regression of this exception) we're just down to verbosity.

The guard should now be:

# if not any([x for x in [relation, relation.app, relation.app.name]]): # -- works, but relies on knowing too much about Python runtime
if not relation or not relation.app or not relation.app.name: # works without risk of subtle breakage
    return {}

assert isinstance(relation.app, Application)  # type guard

To narrow it down to the appropriate place, if you deploy only Grafana and remove it, is this resolved @barrettj12? That will let us close this one, and we can look at "ingress per unit" in Loki/Prometheus.

barrettj12 commented 1 year ago

@rbarry82: if I do

juju deploy grafana-k8s --channel=edge --trust
juju remove-application grafana-k8s

I do not get the ingress-relation-broken hook failure. I get a separate issue (grafana-relation-broken hook failure, reported here).

barrettj12 commented 1 year ago

This doesn't seem to cause the issue:

$ juju deploy grafana-k8s --channel=edge --trust
Located charm "grafana-k8s" in charm-hub, revision 59
Deploying "grafana-k8s" from charm-hub charm "grafana-k8s", revision 59 in channel edge on ubuntu@20.04/stable
$ juju deploy traefik-k8s --channel=edge --trust
Located charm "traefik-k8s" in charm-hub, revision 100
Deploying "traefik-k8s" from charm-hub charm "traefik-k8s", revision 100 in channel edge on ubuntu@20.04/stable
$ juju relate grafana-k8s traefik-k8s
$ juju remove-application traefik-k8s
will remove application traefik-k8s
- will remove unit traefik-k8s/0
- will detach storage configurations/3
rbarry82 commented 1 year ago

Thanks @barrettj12. I'm testing Loki+Traefik, then :+1: