Open Abuelodelanada opened 4 months ago
This is an example about why _can_connect
is not reliable.
Although in _update_cert
method we have a _can_connect
guard, some lines of code below a ConnectionError
is raised:
Traceback (most recent call last):
File "./src/charm.py", line 806, in <module>
main(LokiOperatorCharm)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 549, in main
manager = _Manager(charm_class, use_juju_for_storage=use_juju_for_storage)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 432, in __init__
self.charm = self._make_charm(self.framework, self.dispatcher)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/main.py", line 435, in _make_charm
charm = self._charm_class(framework)
File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 402, in wrap_init
original_init(self, framework, *args, **kwargs)
File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/loki_k8s/v0/charm_logging.py", line 229, in wrap_init
original_init(self, framework, *args, **kwargs)
File "./src/charm.py", line 163, in __init__
self._update_cert()
File "/var/lib/juju/agents/unit-loki-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 697, in wrapped_function
return callable(*args, **kwargs) # type: ignore
File "./src/charm.py", line 603, in _update_cert
self._loki_container.exec(["update-ca-certificates", "--fresh"]).wait()
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1695, in wait
exit_code = self._wait()
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1705, in _wait
change = self._client.wait_change(self._change_id, timeout=timeout)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2254, in wait_change
return self._wait_change_using_wait(change_id, timeout)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2275, in _wait_change_using_wait
return self._wait_change(change_id, this_timeout)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2291, in _wait_change
resp = self._request('GET', f'/v1/changes/{change_id}/wait', query)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 1995, in _request
response = self._request_raw(method, path, query, headers, data)
File "/var/lib/juju/agents/unit-loki-0/charm/venv/ops/pebble.py", line 2048, in _request_raw
raise ConnectionError(
ops.pebble.ConnectionError: Could not connect to Pebble: socket not found at '/charm/containers/loki/pebble.socket' (container restarted?)
Enhancement Proposal
Revisit all the interactions we have with Pebble in order to understand whether we should use:
can_connect()
guards. Although it is easy to use, since this is a point-in-time check the fact that returnsTrue
doesn’t mean it will still return True a few milliseconds later. (We are using this pattern in several places but feels brittle)try...except
blocks likeit feels more robust, but we need to understand in each situation what should we do:
tenacity
in order to let pebble to start.defer()
the event