canonical / mysql-router-k8s-operator

Mysql router operator charm for kubernetes
https://charmhub.io/mysql-router-k8s
Apache License 2.0
2 stars 7 forks source link

MySQL Router exporter intermittently throws bind-adress already in use error #219

Open shayancanonical opened 8 months ago

shayancanonical commented 8 months ago

Steps to reproduce

  1. juju model-config update-status-hook-interval=5s
  2. juju deploy -n 1 mysql-k8s --channel 8.0/edge
  3. juju deploy -n 1 mysql-test-app
  4. juju deploy -n 1 mysql-router-k8s --channel 8.0/edge
  5. juju deploy -n 1 grafana-agent-k8s
  6. juju relate mysql-k8s mysql-router-k8s
  7. juju relate mysql-router-k8s mysql-test-app
  8. juju relate grafana-agent-k8s mysql-router-k8s:metrics-endpoint
  9. Remove metrics-endpoint relation and re-relate as many times as necessary to reproduce the below exception

Expected behavior

Due to the reconcile approach in router to resolve services (start services that need to be started and stop services that need to be stopped), the mysql-router-exporter pebble service should be stopped when we attempt to start it, and should not be started again by any proceeding event handlers that call workload.reconcile()

Actual behavior

Occasionally, the error trace below is raised. The bind-address for mysql-router-exporter is already in used, the unit goes into error state. Since the default pebble on-failure is restart, the service is restarted multiple times at which point, subsequently, the bind-address issue is resolved (the pebble service for mysql-router-exporter starts up, and the unit goes into active status).

Versions

Operating system: Ubuntu 22.04

Juju CLI: 2.9.45 and 3.1.6

Juju agent: 2.9.45 and 3.1.6

mysql-k8s charm revision: 132 mysql-router-k8s charm revision: 99

microk8s: 1.28-strict/stable

Log output

Juju debug log:

unit-mysql-router-k8s-0: 01:33:20 ERROR unit.mysql-router-k8s/0.juju-log metrics-endpoint:12: Uncaught exception while in charm code:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/./src/kubernetes_charm.py", line 197, in <module>
    ops.main.main(KubernetesRouterCharm)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/main.py", line 441, in main
    _emit_charm_event(charm, dispatcher.event_name)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
    event_to_emit.emit(*args, **kwargs)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 344, in emit
    framework._emit(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 841, in _emit
    self._reemit(event_path)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/framework.py", line 930, in _reemit
    custom_handler(event)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/abstract_charm.py", line 289, in reconcile
    workload_.reconcile(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/workload.py", line 368, in reconcile
    self._container.update_mysql_router_exporter_service(
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/src/rock.py", line 198, in update_mysql_router_exporter_service
    self._container.restart(self._EXPORTER_SERVICE_NAME)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/model.py", line 1999, in restart
    self._pebble.restart_services(service_names)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/pebble.py", line 1746, in restart_services
    return self._services_action('restart', services, timeout, delay)
  File "/var/lib/juju/agents/unit-mysql-router-k8s-0/charm/venv/ops/pebble.py", line 1767, in _services_action
    raise ChangeError(change.err, change)
ops.pebble.ChangeError: cannot perform the following tasks:
- Start service "mysql_router_exporter" (cannot start service: exited quickly with code 1)
----- Logs from task 0 -----
2024-04-02T01:33:20Z INFO Service "mysql_router_exporter" has never been started.
----- Logs from task 1 -----
2024-04-02T01:33:20Z INFO Most recent service output:
    2024/04/02 01:33:20 Start exporter on 0.0.0.0:49152/metrics
    2024/04/02 01:33:20 listen tcp 0.0.0.0:49152: bind: address already in use
2024-04-02T01:33:20Z ERROR cannot start service: exited quickly with code 1

Additional context

github-actions[bot] commented 8 months ago

https://warthogs.atlassian.net/browse/DPE-3899