canonical / istio-operators

Charmed Istio
2 stars 17 forks source link

`KeyError` during `RelationBrokenEvent` puts `istio-pilot` in unrecoverable state #423

Closed DnPlas closed 1 month ago

DnPlas commented 1 month ago

Bug Description

The _get_ingress_data() method in istio-pilot retrieves data from the ingress relation data bag and returns a list of "routes" for the charm to create VirtualServices.

On RelationBroken events, this method will try to handle cases when the relation data still shows the departing application (see here). It looks like this block of code is not considering the case when the relation data bag is empty, which may cause an error when running this bit:

   File "./src/charm.py", line 574, in _get_ingress_data
    routes.pop((event.relation, event.app))
KeyError: (<ops.model.Relation ingress:2>, <ops.model.Application kubeflow-volumes>)

which is caused by routes being an empty dictionary.

To Reproduce

  1. Deploy istio-pilot from latest/edge or 1.17/stable
  2. Deploy kubeflow-volumes (or any other ingress requirer)
  3. Relate them through the ingress interface
  4. Wait for them to become stable and then remove the relation
  5. Observe the status and debug logs

Environment

  1. microk8s 1.25-strict/stable
  2. juju 3.5/candidate

Relevant Log Output

unit-istio-pilot-0: 13:58:32 ERROR unit.istio-pilot/0.juju-log ingress:2: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "./src/charm.py", line 1203, in <module>
    main(Operator)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 544, in main
    manager.run()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 520, in run
    self._emit()
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 509, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 143, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 350, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 849, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/framework.py", line 939, in _reemit
    custom_handler(event)
  File "./src/charm.py", line 346, in reconcile
    ingress_data = self._get_ingress_data(event)
  File "./src/charm.py", line 574, in _get_ingress_data
    routes.pop((event.relation, event.app))
KeyError: (<ops.model.Relation ingress:2>, <ops.model.Application kubeflow-volumes>)

Additional Context

This error was first caught by the CI here

syncronize-issues-to-jira[bot] commented 1 month ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5751.

This message was autogenerated

DnPlas commented 1 month ago

This issue is also present when there is a RelationBroken event triggered by an application that no longer has application data in the relation data bag. The fix for this issue should also consider this case.

DnPlas commented 1 month ago

Closed by #424