Handling an update_status event while the service is down triggers an exception on src/workload.py -> healthy check. The 10s timeout raises and the hook is not finished normally.
INFO pytest_operator.plugin:plugin.py:784 Model status:
Model Controller Cloud/Region Version SLA Timestamp
test-ha-784h github-pr-538bf-microk8s microk8s/localhost 3.1.6 unsupported 11:09:27Z
App Version Status Scale Charm Channel Rev Address Exposed Message
zookeeper-k8s waiting 3 zookeeper-k8s 0 10.152.183.206 no waiting for units to settle down
Unit Workload Agent Address Ports Message
zookeeper-k8s/0* active idle 10.1.209.80
zookeeper-k8s/1 active idle 10.1.209.78
zookeeper-k8s/2 error idle 10.1.209.79 hook failed: "update-status"
INFO pytest_operator.plugin:plugin.py:790 Juju error logs:
unit-zookeeper-k8s-0: 10:58:35 ERROR unit.zookeeper-k8s/0.juju-log Cluster upgrade failed, ensure pre-upgrade checks are ran first.
unit-zookeeper-k8s-0: 10:58:53 ERROR unit.zookeeper-k8s/0.juju-log zookeeper service is unreachable or not serving requests
unit-zookeeper-k8s-0: 10:59:02 ERROR unit.zookeeper-k8s/0.juju-log zookeeper service is unreachable or not serving requests
unit-zookeeper-k8s-2: 11:08:02 ERROR unit.zookeeper-k8s/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/./src/charm.py", line 457, in <module>
main(ZooKeeperCharm)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/main.py", line 436, in main
_emit_charm_event(charm, dispatcher.event_name)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/main.py", line 144, in _emit_charm_event
event_to_emit.emit(*args, **kwargs)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 351, in emit
framework._emit(event)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 853, in _emit
self._reemit(event_path)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/framework.py", line 942, in _reemit
custom_handler(event)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/./src/charm.py", line 229, in _on_cluster_relation_changed
if self.state.unit_server.started and not self.workload.healthy:
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
return self(f, *args, **kw)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 379, in __call__
do = self.iter(retry_state=retry_state)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 314, in iter
return fut.result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/tenacity/__init__.py", line 382, in __call__
result = fn(*args, **kwargs)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/src/workload.py", line 92, in healthy
ruok_response = self.exec(command=timeout + ruok)
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/src/workload.py", line 60, in exec
return str(self.container.exec(command, working_dir=working_dir).wait_output())
File "/var/lib/juju/agents/unit-zookeeper-k8s-2/charm/venv/ops/pebble.py", line 1441, in wait_output
raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
ops.pebble.ExecError: non-zero exit code 124 executing ['timeout', '10s', 'bash', '-c', "echo 'ruok' | (exec 3<>/dev/tcp/localhost/2181; cat >&3; cat <&3; exec 3<&-)"], stdout='', stderr=''
unit-zookeeper-k8s-2: 11:08:02 ERROR juju.worker.uniter.operation hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1
Handling an
update_status
event while the service is down triggers an exception onsrc/workload.py -> healthy
check. The 10s timeout raises and the hook is not finished normally.This issue happened on a full cluster crash HA test on koozeeper-k8s
Log output