canonical / opensearch-operator

OpenSearch operator
Apache License 2.0
10 stars 6 forks source link

Move to DP Interfaces library for Relation Data handling (was: KeyError: 'started') #386

Open juditnovak opened 1 month ago

juditnovak commented 1 month ago

Steps to reproduce

  1. Had a full Openserach + OSD cluster running on multipass
  2. After a host restart Opensearch was blocked on kernel parameters (known issue)
  3. Thus I've made an attempt to juju remove-application opensearch
  4. Units got blocked as demonstrated on the screenshots below. juju resolve --no-retry doesn't help

Actual behavior

Screenshot from 2024-08-06 12-20-40 Screenshot from 2024-08-06 12-20-33

Versions

Operating system:

Juju CLI:

Juju agent:

Charm revision:

LXD:

image

Log output

Juju debug log:

unit-opensearch-14: 12:44:58 DEBUG unit.opensearch/14.juju-log node-lock-fallback:52: Re-emitting deferred event <_StartOpenSearch via OpenSearchOperatorCharm/_start_opensearch_event[14659]>.
unit-opensearch-14: 12:44:58 ERROR unit.opensearch/14.juju-log node-lock-fallback:52: [Errno 111] Connection refused
unit-opensearch-14: 12:44:58 ERROR unit.opensearch/14.juju-log node-lock-fallback:52: Cannot connect to the OpenSearch server...
unit-opensearch-14: 12:44:58 ERROR unit.opensearch/14.juju-log node-lock-fallback:52: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-14/charm/./src/charm.py", line 213, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-opensearch-14/charm/venv/ops/main.py", line 548, in main
    manager.run()
  File "/var/lib/juju/agents/unit-opensearch-14/charm/venv/ops/main.py", line 527, in run
    self._emit()
  File "/var/lib/juju/agents/unit-opensearch-14/charm/venv/ops/main.py", line 513, in _emit
    self.framework.reemit()
  File "/var/lib/juju/agents/unit-opensearch-14/charm/venv/ops/framework.py", line 870, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-opensearch-14/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-opensearch-14/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 820, in _start_opensearch
    self.peers_data.delete(Scope.UNIT, "started")
  File "/var/lib/juju/agents/unit-opensearch-14/charm/lib/charms/opensearch/v0/opensearch_internal_data.py", line 182, in delete
    self.put(scope, key, None)
  File "/var/lib/juju/agents/unit-opensearch-14/charm/lib/charms/opensearch/v0/opensearch_internal_data.py", line 115, in put
    self.put_or_delete(data, key, value)
  File "/var/lib/juju/agents/unit-opensearch-14/charm/lib/charms/opensearch/v0/opensearch_internal_data.py", line 94, in put_or_delete
    del data[key]
KeyError: 'started'
unit-opensearch-14: 12:44:58 ERROR juju.worker.uniter.operation hook "node-lock-fallback-relation-broken" (via hook dispatching script: dispatch) failed: exit status 1
unit-opensearch-14: 12:44:58 INFO juju.worker.uniter awaiting error resolution for "relation-broken" hook
unit-opensearch-13: 12:45:02 INFO juju.worker.uniter awaiting error resolution for "relation-departed" hook
unit-opensearch-14: 12:45:10 INFO juju.worker.uniter awaiting error resolution for "relation-broken" hook
github-actions[bot] commented 1 month ago

https://warthogs.atlassian.net/browse/DPE-5060

juditnovak commented 1 month ago

Ideal solution: using DP Interfaces both for peer and cross-charm relations in Opensearch.

juditnovak commented 1 month ago

Related ticket is: https://warthogs.atlassian.net/browse/DPE-4395