canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Charm never recovers when one unit is offline with two units waiting to join cluster #530

Open shayancanonical opened 12 hours ago

shayancanonical commented 12 hours ago

Steps to reproduce

  1. juju deploy -n 3 mysql-k8s --channel 8.0/edge
  2. wait until the first unit is online, and then run microk8s.kubectl -n model-name delete pod mysql-k8s-0 as soon as it goes online
  3. wait until the unit comes back online

Expected behavior

The cluster should be able to recover performing a full-cluster crash recovery (even though there is only one member in the cluster). The two waiting units should not be considered as they are yet to be a part of the cluster.

Actual behavior

The cluster is stuck with one unit in offline and two units in waiting status

nova-mysql/0*                maintenance  idle   10.1.28.217          offline
nova-mysql/1                 waiting      idle   10.1.180.21          waiting to get cluster primary from peers
nova-mysql/2                 waiting      idle   10.1.190.214         waiting to get cluster primary from peers

Versions

Operating system: Ubuntu 22.04 LTS

Juju CLI: 3.5.4

Juju agent: 3.5.4

Charm revision: 180

Log output

Juju debug log:

unit-nova-mysql-0: 10:39:54 INFO unit.nova-mysql/0.juju-log Persisting configuration changes to file
unit-nova-mysql-0: 10:39:54 INFO unit.nova-mysql/0.juju-log Configuration change requires restart
unit-nova-mysql-0: 11:39:55 ERROR unit.nova-mysql/0.juju-log Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 888, in <module>
    main(MySQLOperatorCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 551, in main
    manager.run()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 530, in run
    self._emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 519, in _emit
    _emit_charm_event(self.charm, self.dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 536, in _on_config_changed
    self.on[f"{self.restart.name}"].acquire_lock.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 399, in _on_acquire_lock
    self.charm.on[self.name].relation_changed.emit(relation, app=self.charm.app)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 348, in _on_relation_changed
    self.charm.on[self.name].process_locks.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 384, in _on_process_locks
    self.charm.on[self.name].run_with_lock.emit()
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 348, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 860, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/framework.py", line 950, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/rolling_ops/v0/rollingops.py", line 415, in _on_run_with_lock
    callback(event)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/./src/charm.py", line 449, in _restart
    container.pebble.restart_services([MYSQLD_SAFE_SERVICE], timeout=3600)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2201, in restart_services
    return self._services_action('restart', services, timeout, delay)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2224, in _services_action
    change = self.wait_change(change_id, timeout=timeout, delay=delay)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2254, in wait_change
    return self._wait_change_using_wait(change_id, timeout)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/venv/ops/pebble.py", line 2282, in _wait_change_using_wait
    raise TimeoutError(f'timed out waiting for change {change_id} ({timeout} seconds)')
ops.pebble.TimeoutError: timed out waiting for change 471 (3600 seconds)
unit-nova-mysql-0: 11:39:55 ERROR juju.worker.uniter.operation hook "config-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-nova-mysql-0: 11:39:57 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-nova-mysql-0: 11:40:02 INFO juju.worker.uniter awaiting error resolution for "config-changed" hook
unit-nova-mysql-0: 11:40:03 INFO juju.worker.uniter.operation ran "config-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:04 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:05 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:06 INFO juju.worker.uniter.operation ran "database-peers-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:07 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:08 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster status for cluster-b65b6fff3ec3a31de1d455381cc8497a
unit-nova-mysql-0: 11:40:10 ERROR unit.nova-mysql/0.juju-log database-peers:29: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/src/mysql_k8s_helpers.py", line 786, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 724, in wrapped_function
    return callable(*args, **kwargs)  # type: ignore
  File "/var/lib/juju/agents/unit-nova-mysql-0/charm/lib/charms/mysql/v0/mysql.py", line 1872, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-0: 11:40:11 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)

Additional context

syncronize-issues-to-jira[bot] commented 12 hours ago

Thank you for reporting us your feedback!

The internal ticket has been created: https://warthogs.atlassian.net/browse/DPE-6039.

This message was autogenerated