canonical / mysql-router-k8s-operator

Mysql router operator charm for kubernetes
https://charmhub.io/mysql-router-k8s
Apache License 2.0
2 stars 6 forks source link

Enabling TLS in both mysql and mysql-router post-deployment may trigger a transient exception #217

Open phvalguima opened 4 months ago

phvalguima commented 4 months ago

Trying to add self-signed-certificates and relating to mysql-k8s and mysql-router-k8s after they were deployed failed with a transient issue (check below). I call it "transient", as eventually after a quick interval, the issue passes and the charm is able to progress.

It seems we had a disconnection from mysql-k8s as it is also setting up its own certificates. I think we should catch this exception in two places:

  1. "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/workload.py", line 297, in status: catch the subprocess exception and re-raise it in a more meaningful exception within the framework, depending on what is the error
  2. "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/relations/tls.py", line 136, in save_certificate: and deferring the call, or running a tenacity.retry to try reconnect with Mysql.

ops.pebble.ExecError: non-zero exit code 1 executing ['mysqlsh', '--no-wizard', '--python', '--file', '/tmp/mysqlsh_script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nNo PRIMARY member found for cluster \'cluster-208bb456a0b9cbfdb812b401b3cb4651\'\nTraceback (most recent call last):\n  File "<string>", line 10, in <module>\nmysqlsh.Error: Shell Error (51314): ClusterSet.list_routers: This function is not available through a session to an InnoDB Cluster that belongs to an InnoDB ClusterSet but is not ONLINE\n'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/./src/kubernetes_charm.py", line 267, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/lib/charms/tls_certificates_interface/v1/tls_certificates.py", line 1309, in _on_relation_changed
    self.on.certificate_available.emit(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/relations/tls.py", line 300, in _on_certificate_available
    self._relation.save_certificate(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/relations/tls.py", line 136, in save_certificate
    self._charm.get_workload(event=None).enable_tls(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/workload.py", line 284, in enable_tls
    self._restart(tls=True)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/workload.py", line 278, in _restart
    self._charm.set_status(event=None)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/abstract_charm.py", line 161, in set_status
    self.unit.status = self._determine_unit_status(event=event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/abstract_charm.py", line 149, in _determine_unit_status
    workload_status = self.get_workload(event=event).status
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/workload.py", line 297, in status
    if not self.shell.is_router_in_cluster_set(self._router_id):
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/mysql_shell/__init__.py", line 229, in is_router_in_cluster_set
    self._run_code(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/mysql_shell/__init__.py", line 88, in _run_code
    self._container.run_mysql_shell(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/container.py", line 144, in run_mysql_shell
    return self._run_command(args, timeout=timeout)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/rock.py", line 164, in _run_command
    raise container.CalledProcessError(
container.CalledProcessError: Command '['mysqlsh', '--no-wizard', '--python', '--file', '/tmp/mysqlsh_script.py']' returned non-zero exit status 1.
github-actions[bot] commented 4 months ago

https://warthogs.atlassian.net/browse/DPE-3886

carlcsaposs-canonical commented 4 months ago

This error appears unrelated to setting up TLS on router—I think only the TLS setup on server is relevant here

I think there might be an issue on the server end

On router end, not sure how to handle this better given that we're getting a shell error, not a DBError

We could catch & re-raise this specific exception, but we would miss other shell errors