canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Cannot recover from k8s endpoint temporarily unavailable situation #420

Open nobuto-m opened 4 months ago

nobuto-m commented 4 months ago

MySQL units can be stuck at the following status (but all units are idle).

Unit Workload Agent Address Ports Message
keystone-mysql/0 maintenance idle 10.1.58.103 joining the cluster
keystone-mysql/1* active idle 10.1.51.41 Primary
keystone-mysql/2 waiting idle 10.1.48.106 waiting to get cluster primary from peers

Steps to reproduce

  1. Follow: https://microstack.run/docs/multi-node-maas

And more details are in: https://bugs.launchpad.net/snap-openstack/+bug/2067451

Expected behavior

The charm can recover from such an event.

Actual behavior

Unhandled exceptions are recorded, and the charm cannot complete the cluster deployment.

unit-keystone-mysql-2: 06:07:58 DEBUG unit.keystone-mysql/2.juju-log ops 2.10.0 up and running.
unit-keystone-mysql-2: 06:07:58 DEBUG unit.keystone-mysql/2.juju-log Emitting Juju event mysql_pebble_ready.
unit-keystone-mysql-2: 06:08:01 ERROR unit.keystone-mysql/2.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3019, in _run
    result = subprocess.run(args, **kwargs) # type: ignore
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '('/var/lib/juju/tools/unit-keystone-mysql-2/secret-get', '--label', 'database-peers.keystone-mysql.app', '--format=json')' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 788, in <module>
    main(MySQLOperatorCharm)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/main.py", line 456, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/main.py", line 144, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 351, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 853, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/framework.py", line 943, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 572, in _on_mysql_pebble_ready
    if self._mysql_pebble_ready_checks(event):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/./src/charm.py", line 555, in _mysql_pebble_ready_checks
    if not self._is_peer_data_set:
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 632, in _is_peer_data_set
    and self.get_secret("app", ROOT_PASSWORD_KEY)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 704, in get_secret
    if not (value := self.peer_relation_data(scope).fetch_my_relation_field(peers.id, key)):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1256, in fetch_my_relation_field
    if relation_data := self.fetch_my_relation_data([relation_id], [field], relation_name):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1245, in fetch_my_relation_data
    data[relation.id] = self._fetch_my_specific_relation_data(relation, fields)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 534, in wrapper
    return f(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2079, in _fetch_my_specific_relation_data
    self.component, self.secret_fields, relation, fields
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1813, in secret_fields
    self.static_secret_fields if self.static_secret_fields else self.current_secret_fields
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1832, in current_secret_fields
    if content := self._get_group_secret_contents(relation, group):
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2066, in _get_group_secret_contents
    result = super()._get_group_secret_contents(relation, group, secret_fields)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 996, in _get_group_secret_contents
    if (secret := self._get_relation_secret(relation.id, group)) and (
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 506, in wrapper
    return f(self, *args, **kwargs)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 2055, in _get_relation_secret
    return self.secrets.get(label, secret_uri, legacy_labels=self._previous_labels())
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 702, in get
    if secret.meta:
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 605, in meta
    self._secret_meta = self._model.get_secret(label=label)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 281, in get_secret
    content = self._backend.secret_get(id=id, label=label)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3375, in secret_get
    result = self._run('secret-get', *args, return_output=True, use_json=True)
  File "/var/lib/juju/agents/unit-keystone-mysql-2/charm/venv/ops/model.py", line 3021, in _run
    raise ModelError(e.stderr) from e
ops.model.ModelError: ERROR cannot ensure service account "unit-keystone-mysql-2": Post "https://192.168.151.102:16443/api/v1/namespaces/openstack/serviceaccounts": read tcp 192.168.151.101:33354->192.168.151.102:16443: read: connection reset by peer

unit-keystone-mysql-2: 06:08:02 ERROR juju.worker.uniter.operation hook "mysql-pebble-ready" (via hook dispatching script: dispatch) failed: exit status 1
unit-keystone-mysql-2: 06:08:02 ERROR juju.worker.uniter pebble poll failed for container "mysql": failed to send pebble-ready event: hook failed

Versions

Operating system: 22.04 LTS

Juju CLI: 3.4.2

Juju agent: 3.4.2

Charm revision: 8.0/edge: 138

microk8s: microk8s v1.28.7 6532 1.28-strict/stable

Log output

Juju debug log:

https://bugs.launchpad.net/snap-openstack/+bug/2067451/+attachment/5783832/+files/sunbeam-inspection-report-20240529_071507.tar.gz

Additional context

github-actions[bot] commented 4 months ago

https://warthogs.atlassian.net/browse/DPE-4473

taurus-forever commented 3 months ago

The funny part, we even have a test for keystone, which looks stable.

It requires investigation.

BTW, I suspect keystone is still using legacy shared_db interface, which is under our radars. Is it true? If so, can we migrate to the modern interface?

Anyway, @nobuto-m tnx for reporting!

gboutry commented 2 months ago

This report is using keystone-k8s, the charm used in sunbeam. Which works very differently from the machine charm + is using data_interfaces 0.37