canonical / mysql-k8s-operator

A Charmed Operator for running MySQL on Kubernetes
https://charmhub.io/mysql-k8s
Apache License 2.0
8 stars 15 forks source link

Mysql stay in blocked state until status update: error juju trust #322

Open gboutry opened 11 months ago

gboutry commented 11 months ago

Steps to reproduce

  1. Deploy mysql k8s with juju trust.

I can't reproduce every time, but it has happened to multiple sunbeam users

Expected behavior

If Mysql charm was able to create endpoint, no error about Juju trust

Actual behavior

Mysql fails to wait on primary endpoint, interpreted as Juju trust missing error

Versions

Operating system: Ubuntu 22.04.3 LTS

Juju CLI: 3.1.6-genericlinux-amd64

Juju agent: 3.1.6

Charm revision: 99 (8.0/stable)

microk8s: MicroK8s v1.28.0 revision 5802

Log output

Juju debug log:

2023-10-12T06:50:14.623Z [pebble] HTTP API server listening on ":38812".
2023-10-12T06:50:14.623Z [pebble] Started daemon.
2023-10-12T06:50:14.629Z [pebble] POST /v1/services 5.156354ms 202
2023-10-12T06:50:14.629Z [pebble] Started default services with change 1.
2023-10-12T06:50:14.636Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0
2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 614f480c6ca83d33c7563bb7a3bfb00225b34e33 gc go1.20.8]
2023-10-12T06:50:14.657Z [container-agent] starting containeragent unit command
2023-10-12T06:50:14.657Z [container-agent] containeragent unit "unit-mysql-0" start (3.1.6 [gc])
2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.cmd.containeragent.unit runner.go:578 start "unit"
2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run.
2023-10-12T06:50:14.658Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301
2023-10-12T06:50:14.692Z [container-agent] 2023-10-12 06:50:14 INFO juju.api apiclient.go:706 connection established to "wss://controller-service.controller-k8s.svc.cluster.local:17070/model/4dcd73af-bac1-4b77-84ee-00388d820dfc/api"
2023-10-12T06:50:14.696Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.apicaller connect.go:163 [4dcd73] "unit-mysql-0" successfully connected to "controller-service.controller-k8s.svc.cluster.local:17070"
2023-10-12T06:50:14.718Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2023-10-12T06:50:14.719Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.logger logger.go:120 logger worker started
2023-10-12T06:50:14.722Z [container-agent] 2023-10-12 06:50:14 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2023-10-12T06:50:14.724Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.leadership tracker.go:194 mysql/0 promoted to leadership of mysql
2023-10-12T06:50:14.735Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2023-10-12T06:50:14.735Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2023-10-12T06:50:14.745Z [container-agent] 2023-10-12 06:50:14 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0
2023-10-12T06:50:14.759Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter uniter.go:363 unit "mysql/0" started
2023-10-12T06:50:14.762Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-10-12T06:50:14.763Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/mysql-k8s-99 from API server
2023-10-12T06:50:14.764Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/mysql-k8s-99
2023-10-12T06:50:14.893Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:92 download complete ("ch:amd64/jammy/mysql-k8s-99")
2023-10-12T06:50:14.969Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:172 download verified ("ch:amd64/jammy/mysql-k8s-99")
2023-10-12T06:50:24.627Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2023-10-12T06:50:26.588Z [container-agent] 2023-10-12 06:50:26 INFO juju.worker.uniter uniter.go:389 hooks are retried true
2023-10-12T06:50:26.671Z [container-agent] 2023-10-12 06:50:26 INFO juju.worker.uniter resolver.go:165 found queued "install" hook
2023-10-12T06:50:27.573Z [container-agent] 2023-10-12 06:50:27 INFO juju-log Running legacy hooks/install.
2023-10-12T06:50:28.301Z [container-agent] 2023-10-12 06:50:28 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:29.074Z [container-agent] 2023-10-12 06:50:29 INFO juju.worker.uniter.operation runhook.go:186 ran "database-peers-relation-created" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:29.730Z [container-agent] 2023-10-12 06:50:29 INFO juju.worker.uniter.operation runhook.go:186 ran "upgrade-relation-created" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:30.394Z [container-agent] 2023-10-12 06:50:30 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-created" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:31.028Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-created" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:31.040Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook
2023-10-12T06:50:31.824Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:32.467Z [container-agent] 2023-10-12 06:50:32 INFO juju.worker.uniter.operation runhook.go:186 ran "mysql-pebble-ready" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:33.132Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter.operation runhook.go:186 ran "database-storage-attached" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:33.853Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:33.862Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter resolver.go:165 found queued "start" hook
2023-10-12T06:50:34.210Z [container-agent] 2023-10-12 06:50:34 INFO juju-log Running legacy hooks/start.
2023-10-12T06:50:34.625Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418
2023-10-12T06:50:44.630Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418
2023-10-12T06:50:44.630Z [pebble] Check "readiness" failure threshold 3 hit, triggering action
2023-10-12T06:50:48.120Z [container-agent] 2023-10-12 06:50:48 INFO juju-log Configuring instance
2023-10-12T06:50:53.776Z [container-agent] 2023-10-12 06:50:53 INFO juju-log Creating cluster on the leader unit
2023-10-12T06:50:54.626Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 418
2023-10-12T06:50:56.509Z [container-agent] 2023-10-12 06:50:56 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:56.941Z [container-agent] 2023-10-12 06:50:56 INFO juju-log upgrade:3: Unit workload member-state is online with member-role primary
2023-10-12T06:50:57.545Z [container-agent] 2023-10-12 06:50:57 INFO juju.worker.uniter.operation runhook.go:186 ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:50:58.214Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes pod label primary created
2023-10-12T06:50:58.229Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes service mysql-primary created
2023-10-12T06:50:58.245Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes service mysql-replicas created
2023-10-12T06:51:07.470Z [container-agent] 2023-10-12 06:51:07 ERROR juju-log database:10: Failed to create k8s services for endpoints
2023-10-12T06:51:07.470Z [container-agent] Traceback (most recent call last):
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/src/relations/mysql_provider.py", line 181, in _on_database_requested
2023-10-12T06:51:07.470Z [container-agent]     self.charm.k8s_helpers.wait_service_ready((primary_endpoint, 3306))
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f
2023-10-12T06:51:07.470Z [container-agent]     return self(f, *args, **kw)
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 379, in __call__
2023-10-12T06:51:07.470Z [container-agent]     do = self.iter(retry_state=retry_state)
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 325, in iter
2023-10-12T06:51:07.470Z [container-agent]     raise retry_exc.reraise()
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 158, in reraise
2023-10-12T06:51:07.470Z [container-agent]     raise self.last_attempt.result()
2023-10-12T06:51:07.470Z [container-agent]   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
2023-10-12T06:51:07.470Z [container-agent]     return self.__get_result()
2023-10-12T06:51:07.470Z [container-agent]   File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2023-10-12T06:51:07.470Z [container-agent]     raise self._exception
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 382, in __call__
2023-10-12T06:51:07.470Z [container-agent]     result = fn(*args, **kwargs)
2023-10-12T06:51:07.470Z [container-agent]   File "/var/lib/juju/agents/unit-mysql-0/charm/src/k8s_helpers.py", line 192, in wait_service_ready
2023-10-12T06:51:07.470Z [container-agent]     raise KubernetesClientError
2023-10-12T06:51:07.470Z [container-agent] k8s_helpers.KubernetesClientError
2023-10-12T06:51:07.747Z [container-agent] 2023-10-12 06:51:07 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:08.502Z [container-agent] 2023-10-12 06:51:08 WARNING juju-log database:10: Kubernetes service already exists
2023-10-12T06:51:08.896Z [container-agent] 2023-10-12 06:51:08 INFO juju-log database:10: Created user for app barbican
2023-10-12T06:51:09.159Z [container-agent] 2023-10-12 06:51:09 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:09.873Z [container-agent] 2023-10-12 06:51:09 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:10.642Z [container-agent] 2023-10-12 06:51:10 WARNING juju-log database:9: Kubernetes service already exists
2023-10-12T06:51:11.012Z [container-agent] 2023-10-12 06:51:11 INFO juju-log database:9: Created user for app keystone
2023-10-12T06:51:11.269Z [container-agent] 2023-10-12 06:51:11 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:11.976Z [container-agent] 2023-10-12 06:51:11 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:12.579Z [container-agent] 2023-10-12 06:51:12 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:13.192Z [container-agent] 2023-10-12 06:51:13 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:14.488Z [container-agent] 2023-10-12 06:51:14 INFO juju.worker.uniter.operation runhook.go:186 ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:51:38.771Z [container-agent] 2023-10-12 06:51:38 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:55:33.701Z [container-agent] 2023-10-12 06:55:33 INFO juju-log Unit workload member-state is online with member-role primary
2023-10-12T06:55:34.358Z [container-agent] 2023-10-12 06:55:34 WARNING juju-log No relation: certificates
2023-10-12T06:55:34.631Z [container-agent] 2023-10-12 06:55:34 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2023-10-12T06:56:18.569Z [container-agent] 2023-10-12 06:56:18 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch)
2023-10-12T06:56:19.398Z [container-agent] 2023-10-12 06:56:19 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:56:42.679Z [container-agent] 2023-10-12 06:56:42 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
2023-10-12T06:56:48.579Z [container-agent] 2023-10-12 06:56:48 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)

Additional context

Seems like Mysql Charm was able to create endpoints fine. But failed on wait_service_ready.

When wait_service_ready fails, it raises a KubernetesClientError, which is interpreted as trust issues here

github-actions[bot] commented 11 months ago

https://warthogs.atlassian.net/browse/DPE-2739

paulomach commented 7 months ago

@gboutry , is this still being reported? We set 10 retries, one second each - wonder if is too little. It cab easily be increased though