canonical / opensearch-operator

OpenSearch operator
Apache License 2.0
12 stars 6 forks source link

[LOW IMPORTANCE][STABILITY] KeyError: 'node.roles' #222

Closed juditnovak closed 3 weeks ago

juditnovak commented 5 months ago

I've seen this error a few times on local runs, so I add this ticket to signify - in case it may occur to others or on pipelines pls add more info to the ticket.

The following exception has been raised a couple of times:

unit-opensearch-1: 11:00:28 ERROR unit.opensearch/1.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 408, in current
    nodes = self.request("GET", f"/_nodes/{self.node_id}", alt_hosts=self._charm.alt_hosts)
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 375, in node_id
    nodes = self.request("GET", "/_nodes").get("nodes")
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 297, in request
    resp = call(retries, resp_status_code)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 252, in call
    raise OpenSearchHttpError()
charms.opensearch.v0.opensearch_exceptions.OpenSearchHttpError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-1/charm/./src/charm.py", line 94, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 434, in _on_opensearch_data_storage_detaching
    self._stop_opensearch()
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 825, in _stop_opensearch
    self.opensearch_exclusions.delete_current()
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_nodes_exclusions.py", line 58, in delete_current
    self._node.is_cm_eligible() or self._node.is_voting_only()
  File "/usr/lib/python3.10/functools.py", line 981, in __get__
    val = self.func(instance)
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_nodes_exclusions.py", line 161, in _node
    return self._charm.opensearch.current()
  File "/var/lib/juju/agents/unit-opensearch-1/charm/lib/charms/opensearch/v0/opensearch_distro.py", line 423, in current
    roles=conf_on_disk["node.roles"],
  File "/var/lib/juju/agents/unit-opensearch-1/charm/venv/ruamel/yaml/comments.py", line 842, in __getitem__
    return ordereddict.__getitem__(self, key)
KeyError: 'node.roles'

May worth to take a look, may be just a programming oversight?

github-actions[bot] commented 5 months ago

https://warthogs.atlassian.net/browse/DPE-4049

juditnovak commented 1 month ago

Context revealed when investigating the issue.

When a unit is online, unit-specific service information is taken from the Opensearch API (/_nodes).

However if the unit is offline, we fall back to static configuration. This is the part where we had the issue, in case node.roles was missing from the configuration.