canonical / opensearch-operator

OpenSearch operator
Apache License 2.0
12 stars 7 forks source link

Initializing security index may happen too early in the starting process #444

Closed phvalguima closed 1 month ago

phvalguima commented 2 months ago

I've caught this failure in the CI:

unit-opensearch-0: 09:04:30 ERROR unit.opensearch/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-opensearch-0/charm/./src/charm.py", line 213, in <module>
    main(OpenSearchOperatorCharm)
  File "/var/lib/juju/agents/unit-opensearch-0/charm/venv/ops/main.py", line 551, in main
    manager.run()
  File "/var/lib/juju/agents/unit-opensearch-0/charm/venv/ops/main.py", line 530, in run
    self._emit()
  File "/var/lib/juju/agents/unit-opensearch-0/charm/venv/ops/main.py", line 516, in _emit
    self.framework.reemit()
  File "/var/lib/juju/agents/unit-opensearch-0/charm/venv/ops/framework.py", line 870, in reemit
    self._reemit()
  File "/var/lib/juju/agents/unit-opensearch-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-opensearch-0/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 849, in _start_opensearch
    self._post_start_init(event)
  File "/var/lib/juju/agents/unit-opensearch-0/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 931, in _post_start_init
    self._initialize_security_index(admin_secrets)
  File "/var/lib/juju/agents/unit-opensearch-0/charm/lib/charms/opensearch/v0/opensearch_base_charm.py", line 1319, in _initialize_security_index
    f"-cn {self.opensearch_peer_cm.deployment_desc().config.cluster_name}",
AttributeError: 'NoneType' object has no attribute 'config'

In this run: https://github.com/canonical/opensearch-operator/actions/runs/10880348684/job/30187291459?pr=443

syncronize-issues-to-jira[bot] commented 2 months ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-5468.

This message was autogenerated

phvalguima commented 2 months ago

This issue is caused because now, test_charm.py has a check for: deploy a single unit and then remove application. That test was intended to detect any unchecked deployment_desc() calls, as discussed here: https://github.com/canonical/opensearch-operator/pull/361

That is what is effectively happening here.

This is caused by a bug in juju, added on 3.4.4: https://bugs.launchpad.net/juju/+bug/2076599 Asked @shayancanonical to follow up with the team.

While we do not have a fix for that bug, I suggest we add a check here: if deployment dec. not available, then we can defer the event.