canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

MySQL router removed from cluster set during scale up of mysql-k8s & mysql-router-k8s #432

Open carlcsaposs-canonical opened 3 weeks ago

carlcsaposs-canonical commented 3 weeks ago

Steps to reproduce

Unavailable https://bugs.launchpad.net/snap-openstack/+bug/2068924

Appears to be scale up of mysql-k8s & mysql-router-k8s

Expected behavior

All units active/idle

Actual behavior

nova-mysql-router/2 in BlockedStatus Router was manually removed from MySQL ClusterSet. Remove & re-deploy unit

Versions

Juju agent: 3.4.3

mysql-k8s charm revision: 127 mysql-router-k8s charm revision: 96

Log output

https://oil-jenkins.canonical.com/artifacts/b0040961-eb15-40f3-a89b-e49774c6f5d4/index.html

Juju debug log: debug_log_openstack.txt juju_status_openstack.txt

unit-nova-mysql-router-0: 22:27:31 ERROR unit.nova-mysql-router/0.juju-log upgrade-version-a:44: Failed to run MySQL Shell script:
import json
import mysqlsh
import traceback

try:
    shell.connect("relation-75:***@nova-mysql-primary.openstack.svc.cluster.local:3306")
    import json

    cluster_set = dba.get_cluster_set()
    routers = cluster_set.list_routers()
    # mysqlsh objects are weird—they quack (i.e. duck typing) like standard Python objects (e.g. list,
    # dict), but do not serialize to JSON correctly.
    # Cast to str & load from JSON str before serializing
    routers = json.loads(str(routers))
    with open("/tmp/mysqlsh_output.json", "w") as file:
        json.dump(routers, file)
except mysqlsh.DBError as exception:
    error = {
        "message": str(exception),
        "code": exception.code,
        "traceback_message": "".join(traceback.format_exception(exception)),
    }
else:
    error = None
with open("/tmp/mysqlsh_error.json", "w") as file:
    json.dump(error, file)

MySQL client error 2013
MySQL Shell traceback:
Traceback (most recent call last):
  File "<string>", line 6, in <module>
mysqlsh.DBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 108, in _run_code
    raise ShellDBError(**exception)
mysql_shell.ShellDBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110
unit-nova-mysql-router-0: 22:27:31 ERROR unit.nova-mysql-router/0.juju-log upgrade-version-a:44: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/./src/kubernetes_charm.py", line 185, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/abstract_charm.py", line 261, in reconcile
    if not workload_.status or workload_.status == ops.WaitingStatus():
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/workload.py", line 297, in status
    if not self.shell.is_router_in_cluster_set(self._router_id):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 229, in is_router_in_cluster_set
    self._run_code(
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 108, in _run_code
    raise ShellDBError(**exception)
mysql_shell.ShellDBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110
unit-nova-mysql-router-0: 22:27:31 ERROR juju.worker.uniter.operation hook "upgrade-version-a-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-nova-mysql-router-0: 22:27:31 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-router-0: 22:27:36 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-2: 22:27:48 ERROR unit.nova-mysql/2.juju-log database-peers:24: Failed to get cluster status for cluster-df07980f3ce9324ddf8ccee565b8eb7c
unit-nova-mysql-2: 22:27:48 ERROR unit.nova-mysql/2.juju-log database-peers:24: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-2/charm/src/mysql_k8s_helpers.py", line 836, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 1469, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-router-2: 22:27:49 ERROR unit.nova-mysql-router/2.juju-log backend-database:75: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/./src/kubernetes_charm.py", line 185, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1958, in _on_relation_changed_event
    getattr(self.on, "database_created").emit(
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/abstract_charm.py", line 256, in reconcile
    workload_.enable(tls=self._tls_certificate_saved, unit_name=self.unit.name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/workload.py", line 260, in enable
    self._bootstrap_router(tls=tls)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/workload.py", line 203, in _bootstrap_router
    self._container.run_mysql_router(command, timeout=30)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/container.py", line 134, in run_mysql_router
    return self._run_command(args, timeout=timeout)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/rock.py", line 162, in _run_command
    output, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/pebble.py", line 1354, in wait_output
    exit_code: int = self._wait()
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/pebble.py", line 1316, in _wait
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- exec command "mysqlrouter" (timed out after 30s: context deadline exceeded)
----- Logs from task 0 -----
2024-06-07T22:27:49Z ERROR timed out after 30s: context deadline exceeded
-----
unit-nova-mysql-2: 22:27:49 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:27:49 ERROR juju.worker.uniter.operation hook "backend-database-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-nova-mysql-router-2: 22:27:49 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-2: 22:27:53 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:27:54 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-2: 22:27:55 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:27:56 INFO unit.nova-mysql-router/2.juju-log backend-database:75: database created at 2024-06-07 22:27:56.775062
unit-nova-mysql-2: 22:27:57 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:01 INFO juju.worker.uniter.operation ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:01 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:03 INFO unit.nova-mysql/1.juju-log Starting the log rotate manager
unit-nova-mysql-1: 22:28:03 INFO unit.nova-mysql/1.juju-log Started log rotate manager process with PID 2379
unit-nova-mysql-1: 22:28:03 INFO juju.worker.uniter.operation ran "database-storage-attached" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:06 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:06 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:07 INFO juju.worker.uniter.operation ran "upgrade-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:08 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:08 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:10 INFO juju.worker.uniter.operation ran "restart-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:10 INFO juju.worker.uniter.operation ran "backend-database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:12 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:12 INFO juju.worker.uniter.operation ran "mysql-router-peers-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:14 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:14 INFO juju.worker.uniter.operation ran "backend-database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-0: 22:28:14 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:16 INFO juju.worker.uniter.operation ran "mysql-router-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:17 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:18 INFO juju.worker.uniter.operation ran "backend-database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:18 INFO juju.worker.uniter.operation ran "restart-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:20 INFO juju.worker.uniter.operation ran "backend-database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:21 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:22 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:23 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:25 INFO juju.worker.uniter.operation ran "restart-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:27 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:28 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:29 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:30 INFO unit.nova-mysql/2.juju-log upgrade:26: Unit workload member-state is online with member-role secondary
unit-nova-mysql-router-1: 22:28:31 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:31 INFO juju.worker.uniter.operation ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:31 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:32 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:33 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:33 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:34 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:36 INFO juju.worker.uniter.operation ran "upgrade-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:36 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:36 INFO juju.worker.uniter.operation ran "backend-database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:39 INFO juju.worker.uniter.operation ran "backend-database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:39 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:40 INFO juju.worker.uniter.operation ran "upgrade-version-a-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:41 INFO unit.nova-mysql/2.juju-log upgrade:26: Unit workload member-state is online with member-role secondary
unit-nova-mysql-router-2: 22:28:41 INFO juju.worker.uniter.operation ran "backend-database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:41 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:41 INFO juju.worker.uniter.operation ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:28:42 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:42 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:28:43 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:43 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:43 INFO juju.worker.uniter.operation ran "backend-database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:44 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-0: 22:28:44 WARNING unit.nova-mysql-router/0.juju-log No relation: certificates
unit-nova-mysql-router-1: 22:28:45 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:45 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-0: 22:28:45 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:28:46 INFO juju.worker.uniter.operation ran "backend-database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:46 INFO juju.worker.uniter.operation ran "restart-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:48 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:49 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:49 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:51 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:52 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:53 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:54 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:55 INFO juju.worker.uniter.operation ran "database-peers-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:56 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:57 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:28:57 INFO juju.worker.uniter.operation ran "restart-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:28:58 INFO juju.worker.uniter.operation ran "database-peers-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:28:59 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:29:00 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-2: 22:29:03 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:29:03 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-0: 22:29:06 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:29:07 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:08 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:11 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:13 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:29:22 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:29:30 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:31 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:34 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:29:36 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:29:54 INFO unit.nova-mysql/1.juju-log Unit workload member-state is online with member-role secondary
unit-nova-mysql-1: 22:29:56 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:11 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:21 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:24 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:27 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:30 INFO juju.worker.uniter.operation ran "restart-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:34 INFO juju.worker.uniter.operation ran "upgrade-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:37 INFO juju.worker.uniter.operation ran "restart-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:30:43 INFO unit.nova-mysql/0.juju-log upgrade:26: Unit workload member-state is online with member-role primary
unit-nova-mysql-0: 22:30:48 INFO juju.worker.uniter.operation ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:30:57 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-1: 22:30:58 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-2: 22:31:00 INFO unit.nova-mysql/2.juju-log Unit workload member-state is online with member-role secondary
unit-nova-mysql-2: 22:31:04 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:08 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-router-1: 22:31:13 WARNING unit.nova-mysql-router/1.juju-log No relation: certificates
unit-nova-mysql-router-1: 22:31:13 INFO juju.worker.uniter.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:19 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:22 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:28 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:31 INFO juju.worker.uniter.operation ran "restart-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:35 INFO juju.worker.uniter.operation ran "upgrade-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:40 INFO unit.nova-mysql/0.juju-log upgrade:26: Unit workload member-state is online with member-role primary
unit-nova-mysql-0: 22:31:46 INFO juju.worker.uniter.operation ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:50 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:53 INFO juju.worker.uniter.operation ran "database-relation-joined" hook (via hook dispatching script: dispatch)
unit-nova-mysql-0: 22:31:57 INFO juju.worker.uniter.operation ran "database-relation-changed" hook (via hook dispatching script: dispatch)

Additional context

Initially reported to Data Platform by @gboutry

github-actions[bot] commented 3 weeks ago

https://warthogs.atlassian.net/browse/DPE-4613

carlcsaposs-canonical commented 3 weeks ago

Not sure if the issue is in mysql-router-k8s charm, mysql-k8s charm, or elsewhere

The issue seems to have happened after mysql was offline based on these logs

unit-nova-mysql-router-0: 22:27:31 ERROR unit.nova-mysql-router/0.juju-log upgrade-version-a:44: Failed to run MySQL Shell script:
import json
import mysqlsh
import traceback

try:
    shell.connect("relation-75:***@nova-mysql-primary.openstack.svc.cluster.local:3306")
    import json

    cluster_set = dba.get_cluster_set()
    routers = cluster_set.list_routers()
    # mysqlsh objects are weird—they quack (i.e. duck typing) like standard Python objects (e.g. list,
    # dict), but do not serialize to JSON correctly.
    # Cast to str & load from JSON str before serializing
    routers = json.loads(str(routers))
    with open("/tmp/mysqlsh_output.json", "w") as file:
        json.dump(routers, file)
except mysqlsh.DBError as exception:
    error = {
        "message": str(exception),
        "code": exception.code,
        "traceback_message": "".join(traceback.format_exception(exception)),
    }
else:
    error = None
with open("/tmp/mysqlsh_error.json", "w") as file:
    json.dump(error, file)

MySQL client error 2013
MySQL Shell traceback:
Traceback (most recent call last):
  File "<string>", line 6, in <module>
mysqlsh.DBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 108, in _run_code
    raise ShellDBError(**exception)
mysql_shell.ShellDBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110
unit-nova-mysql-router-0: 22:27:31 ERROR unit.nova-mysql-router/0.juju-log upgrade-version-a:44: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/./src/kubernetes_charm.py", line 185, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/abstract_charm.py", line 261, in reconcile
    if not workload_.status or workload_.status == ops.WaitingStatus():
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/workload.py", line 297, in status
    if not self.shell.is_router_in_cluster_set(self._router_id):
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 229, in is_router_in_cluster_set
    self._run_code(
  File "/var/lib/juju/agents/unit-nova-mysql-router-0/charm/src/mysql_shell/__init__.py", line 108, in _run_code
    raise ShellDBError(**exception)
mysql_shell.ShellDBError: MySQL Error (2013): Shell.connect: Lost connection to MySQL server at 'waiting for initial communication packet', system error: 110
unit-nova-mysql-router-0: 22:27:31 ERROR juju.worker.uniter.operation hook "upgrade-version-a-relation-changed" (via hook dispatching script: dispatch) failed: exit status 1
unit-nova-mysql-router-0: 22:27:31 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-router-0: 22:27:36 INFO juju.worker.uniter awaiting error resolution for "relation-changed" hook
unit-nova-mysql-2: 22:27:48 ERROR unit.nova-mysql/2.juju-log database-peers:24: Failed to get cluster status for cluster-df07980f3ce9324ddf8ccee565b8eb7c
unit-nova-mysql-2: 22:27:48 ERROR unit.nova-mysql/2.juju-log database-peers:24: Failed to get cluster endpoints
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-2/charm/src/mysql_k8s_helpers.py", line 836, in update_endpoints
    rw_endpoints, ro_endpoints, offline = self.get_cluster_endpoints(get_ips=False)
  File "/var/lib/juju/agents/unit-nova-mysql-2/charm/lib/charms/mysql/v0/mysql.py", line 1469, in get_cluster_endpoints
    raise MySQLGetClusterEndpointsError("Failed to get endpoints from cluster status")
charms.mysql.v0.mysql.MySQLGetClusterEndpointsError: Failed to get endpoints from cluster status
unit-nova-mysql-router-2: 22:27:49 ERROR unit.nova-mysql-router/2.juju-log backend-database:75: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/./src/kubernetes_charm.py", line 185, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/lib/charms/data_platform_libs/v0/data_interfaces.py", line 1958, in _on_relation_changed_event
    getattr(self.on, "database_created").emit(
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/abstract_charm.py", line 256, in reconcile
    workload_.enable(tls=self._tls_certificate_saved, unit_name=self.unit.name)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/workload.py", line 260, in enable
    self._bootstrap_router(tls=tls)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/workload.py", line 203, in _bootstrap_router
    self._container.run_mysql_router(command, timeout=30)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/container.py", line 134, in run_mysql_router
    return self._run_command(args, timeout=timeout)
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/src/rock.py", line 162, in _run_command
    output, _ = process.wait_output()
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/pebble.py", line 1354, in wait_output
    exit_code: int = self._wait()
  File "/var/lib/juju/agents/unit-nova-mysql-router-2/charm/venv/ops/pebble.py", line 1316, in _wait
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- exec command "mysqlrouter" (timed out after 30s: context deadline exceeded)
----- Logs from task 0 -----
2024-06-07T22:27:49Z ERROR timed out after 30s: context deadline exceeded
-----
unit-nova-mysql-2: 22:27:49 INFO juju.worker.uniter.operation ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)

wondering if the cluster router metadata got mangled while cluster was recovering from this outage or during scale up

MySQL router shows no errors during bootstrap, so it should've been added to the cluster metadata at that point. Since we only have info-level logs, unclear how much time elapsed after bootstrap and before health check (of router in cluster metadata) failed

carlcsaposs-canonical commented 3 weeks ago

@gboutry are you able to get steps to reproduce or provide more insight on possible ways to reproduce this?

From the original bug report, it appears that a scale up happened—do you know what was scaled up & how we can perform the same scale up while trying to reproduce?

carlcsaposs-canonical commented 3 weeks ago

Potentially related to https://github.com/canonical/mysql-router-operator/issues/118, but seems unlikely

marosg42 commented 1 week ago

@carlcsaposs-canonical We hit it again in SQA while testing Sunbeam. Here are logs from all openstack units https://oil-jenkins.canonical.com/artifacts/858c5749-bd1a-425a-b3bb-6c35b32a3d30/generated/generated/sunbeam/pods_openstack_logs.tgz

barbican-mysql-router/0      active    idle   10.1.109.163         
barbican-mysql-router/1*     active    idle   10.1.210.98          
barbican-mysql-router/2      blocked   idle   10.1.172.226         Router was manually removed from MySQL ClusterSet. Remove & re-deploy unit
marosg42 commented 1 week ago

We hit it twice in tens of tests so I cannot offer steps to reproduce it.

carlcsaposs-canonical commented 1 week ago

logs from barbican-mysql-router/2 show

2024-06-26T01:15:48.751Z [container-agent] 2024-06-26 01:15:48 ERROR juju-log backend-database:151: Failed to bootstrap router
2024-06-26T01:15:48.751Z [container-agent] logged_command=['--bootstrap', 'relation-151:***@barbican-mysql-primary:3306', '--strict', '--force', '--conf-set-option', 'http_server.bind_address=127.0.0.1', '--conf-use-gr-notifications']
2024-06-26T01:15:48.751Z [container-agent] stderr:
2024-06-26T01:15:48.751Z [container-agent] Error: The provided server is currently not in a InnoDB cluster group with quorum and thus may contain inaccurate or outdated data.
2024-06-26T01:15:48.751Z [container-agent] 
2024-06-26T01:15:48.751Z [container-agent] 
2024-06-26T01:15:48.754Z [container-agent] 2024-06-26 01:15:48 ERROR juju-log backend-database:151: MySQL Server does not have quorum. Will retry next Juju event

cc @paulomach @shayancanonical

carlcsaposs-canonical commented 1 week ago

from the logs, it looks like it might be a different issue

carlcsaposs-canonical commented 1 week ago

We hit it twice in tens of tests so I cannot offer steps to reproduce it.

@marosg42 do you have the logs from the other instance it occurred? (or are you referring to original issue?) if so, I'd be curious to look to see if we see the same logs in both instances

marosg42 commented 1 week ago

@carlcsaposs-canonical yes, I meant the original issue reported on Launchpad here are logs https://oil-jenkins.canonical.com/artifacts/b0040961-eb15-40f3-a89b-e49774c6f5d4/generated/generated/sunbeam/pods_openstack_logs.tgz

marosg42 commented 1 week ago

In the first case it was

nova-mysql-router/0*         active    idle   10.1.239.98          
nova-mysql-router/1          active    idle   10.1.109.143         
nova-mysql-router/2          blocked   idle   10.1.210.80          Router was manually removed from MySQL ClusterSet. Remove & re-deploy unit
carlcsaposs-canonical commented 1 week ago

got it, thanks!

@paulomach @shayancanonical FYI, at first glance, these seem like two separate issues (seems likely to be with mysql-k8s charm, especially in 2nd case—but not sure)

paulomach commented 1 week ago

@carlcsaposs-canonical I'm investigating next pulse. From feeling, I thinkg there's some cases where the router can rebootstrap instead of entering blocked state.