Open gboutry opened 1 year ago
I can't reproduce every time, but it has happened to multiple sunbeam users
If Mysql charm was able to create endpoint, no error about Juju trust
Mysql fails to wait on primary endpoint, interpreted as Juju trust missing error
Operating system: Ubuntu 22.04.3 LTS
Juju CLI: 3.1.6-genericlinux-amd64
Juju agent: 3.1.6
Charm revision: 99 (8.0/stable)
microk8s: MicroK8s v1.28.0 revision 5802
Juju debug log:
2023-10-12T06:50:14.623Z [pebble] HTTP API server listening on ":38812". 2023-10-12T06:50:14.623Z [pebble] Started daemon. 2023-10-12T06:50:14.629Z [pebble] POST /v1/services 5.156354ms 202 2023-10-12T06:50:14.629Z [pebble] Started default services with change 1. 2023-10-12T06:50:14.636Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 614f480c6ca83d33c7563bb7a3bfb00225b34e33 gc go1.20.8] 2023-10-12T06:50:14.657Z [container-agent] starting containeragent unit command 2023-10-12T06:50:14.657Z [container-agent] containeragent unit "unit-mysql-0" start (3.1.6 [gc]) 2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.cmd.containeragent.unit runner.go:578 start "unit" 2023-10-12T06:50:14.657Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run. 2023-10-12T06:50:14.658Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301 2023-10-12T06:50:14.692Z [container-agent] 2023-10-12 06:50:14 INFO juju.api apiclient.go:706 connection established to "wss://controller-service.controller-k8s.svc.cluster.local:17070/model/4dcd73af-bac1-4b77-84ee-00388d820dfc/api" 2023-10-12T06:50:14.696Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.apicaller connect.go:163 [4dcd73] "unit-mysql-0" successfully connected to "controller-service.controller-k8s.svc.cluster.local:17070" 2023-10-12T06:50:14.718Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE 2023-10-12T06:50:14.719Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.logger logger.go:120 logger worker started 2023-10-12T06:50:14.722Z [container-agent] 2023-10-12 06:50:14 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: "" 2023-10-12T06:50:14.724Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.leadership tracker.go:194 mysql/0 promoted to leadership of mysql 2023-10-12T06:50:14.735Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received 2023-10-12T06:50:14.735Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check 2023-10-12T06:50:14.745Z [container-agent] 2023-10-12 06:50:14 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-mysql-0 2023-10-12T06:50:14.759Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter uniter.go:363 unit "mysql/0" started 2023-10-12T06:50:14.762Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter uniter.go:689 resuming charm install 2023-10-12T06:50:14.763Z [container-agent] 2023-10-12 06:50:14 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/jammy/mysql-k8s-99 from API server 2023-10-12T06:50:14.764Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:109 downloading from ch:amd64/jammy/mysql-k8s-99 2023-10-12T06:50:14.893Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:92 download complete ("ch:amd64/jammy/mysql-k8s-99") 2023-10-12T06:50:14.969Z [container-agent] 2023-10-12 06:50:14 INFO juju.downloader download.go:172 download verified ("ch:amd64/jammy/mysql-k8s-99") 2023-10-12T06:50:24.627Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418 2023-10-12T06:50:26.588Z [container-agent] 2023-10-12 06:50:26 INFO juju.worker.uniter uniter.go:389 hooks are retried true 2023-10-12T06:50:26.671Z [container-agent] 2023-10-12 06:50:26 INFO juju.worker.uniter resolver.go:165 found queued "install" hook 2023-10-12T06:50:27.573Z [container-agent] 2023-10-12 06:50:27 INFO juju-log Running legacy hooks/install. 2023-10-12T06:50:28.301Z [container-agent] 2023-10-12 06:50:28 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:29.074Z [container-agent] 2023-10-12 06:50:29 INFO juju.worker.uniter.operation runhook.go:186 ran "database-peers-relation-created" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:29.730Z [container-agent] 2023-10-12 06:50:29 INFO juju.worker.uniter.operation runhook.go:186 ran "upgrade-relation-created" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:30.394Z [container-agent] 2023-10-12 06:50:30 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-created" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:31.028Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-created" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:31.040Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook 2023-10-12T06:50:31.824Z [container-agent] 2023-10-12 06:50:31 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:32.467Z [container-agent] 2023-10-12 06:50:32 INFO juju.worker.uniter.operation runhook.go:186 ran "mysql-pebble-ready" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:33.132Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter.operation runhook.go:186 ran "database-storage-attached" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:33.853Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:33.862Z [container-agent] 2023-10-12 06:50:33 INFO juju.worker.uniter resolver.go:165 found queued "start" hook 2023-10-12T06:50:34.210Z [container-agent] 2023-10-12 06:50:34 INFO juju-log Running legacy hooks/start. 2023-10-12T06:50:34.625Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418 2023-10-12T06:50:44.630Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418 2023-10-12T06:50:44.630Z [pebble] Check "readiness" failure threshold 3 hit, triggering action 2023-10-12T06:50:48.120Z [container-agent] 2023-10-12 06:50:48 INFO juju-log Configuring instance 2023-10-12T06:50:53.776Z [container-agent] 2023-10-12 06:50:53 INFO juju-log Creating cluster on the leader unit 2023-10-12T06:50:54.626Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 418 2023-10-12T06:50:56.509Z [container-agent] 2023-10-12 06:50:56 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:56.941Z [container-agent] 2023-10-12 06:50:56 INFO juju-log upgrade:3: Unit workload member-state is online with member-role primary 2023-10-12T06:50:57.545Z [container-agent] 2023-10-12 06:50:57 INFO juju.worker.uniter.operation runhook.go:186 ran "upgrade-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:50:58.214Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes pod label primary created 2023-10-12T06:50:58.229Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes service mysql-primary created 2023-10-12T06:50:58.245Z [container-agent] 2023-10-12 06:50:58 INFO juju-log database:10: Kubernetes service mysql-replicas created 2023-10-12T06:51:07.470Z [container-agent] 2023-10-12 06:51:07 ERROR juju-log database:10: Failed to create k8s services for endpoints 2023-10-12T06:51:07.470Z [container-agent] Traceback (most recent call last): 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/src/relations/mysql_provider.py", line 181, in _on_database_requested 2023-10-12T06:51:07.470Z [container-agent] self.charm.k8s_helpers.wait_service_ready((primary_endpoint, 3306)) 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 289, in wrapped_f 2023-10-12T06:51:07.470Z [container-agent] return self(f, *args, **kw) 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 379, in __call__ 2023-10-12T06:51:07.470Z [container-agent] do = self.iter(retry_state=retry_state) 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 325, in iter 2023-10-12T06:51:07.470Z [container-agent] raise retry_exc.reraise() 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 158, in reraise 2023-10-12T06:51:07.470Z [container-agent] raise self.last_attempt.result() 2023-10-12T06:51:07.470Z [container-agent] File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result 2023-10-12T06:51:07.470Z [container-agent] return self.__get_result() 2023-10-12T06:51:07.470Z [container-agent] File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result 2023-10-12T06:51:07.470Z [container-agent] raise self._exception 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/venv/tenacity/__init__.py", line 382, in __call__ 2023-10-12T06:51:07.470Z [container-agent] result = fn(*args, **kwargs) 2023-10-12T06:51:07.470Z [container-agent] File "/var/lib/juju/agents/unit-mysql-0/charm/src/k8s_helpers.py", line 192, in wait_service_ready 2023-10-12T06:51:07.470Z [container-agent] raise KubernetesClientError 2023-10-12T06:51:07.470Z [container-agent] k8s_helpers.KubernetesClientError 2023-10-12T06:51:07.747Z [container-agent] 2023-10-12 06:51:07 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:08.502Z [container-agent] 2023-10-12 06:51:08 WARNING juju-log database:10: Kubernetes service already exists 2023-10-12T06:51:08.896Z [container-agent] 2023-10-12 06:51:08 INFO juju-log database:10: Created user for app barbican 2023-10-12T06:51:09.159Z [container-agent] 2023-10-12 06:51:09 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:09.873Z [container-agent] 2023-10-12 06:51:09 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:10.642Z [container-agent] 2023-10-12 06:51:10 WARNING juju-log database:9: Kubernetes service already exists 2023-10-12T06:51:11.012Z [container-agent] 2023-10-12 06:51:11 INFO juju-log database:9: Created user for app keystone 2023-10-12T06:51:11.269Z [container-agent] 2023-10-12 06:51:11 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:11.976Z [container-agent] 2023-10-12 06:51:11 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:12.579Z [container-agent] 2023-10-12 06:51:12 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:13.192Z [container-agent] 2023-10-12 06:51:13 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:14.488Z [container-agent] 2023-10-12 06:51:14 INFO juju.worker.uniter.operation runhook.go:186 ran "database-peers-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:51:38.771Z [container-agent] 2023-10-12 06:51:38 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:55:33.701Z [container-agent] 2023-10-12 06:55:33 INFO juju-log Unit workload member-state is online with member-role primary 2023-10-12T06:55:34.358Z [container-agent] 2023-10-12 06:55:34 WARNING juju-log No relation: certificates 2023-10-12T06:55:34.631Z [container-agent] 2023-10-12 06:55:34 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch) 2023-10-12T06:56:18.569Z [container-agent] 2023-10-12 06:56:18 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-joined" hook (via hook dispatching script: dispatch) 2023-10-12T06:56:19.398Z [container-agent] 2023-10-12 06:56:19 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:56:42.679Z [container-agent] 2023-10-12 06:56:42 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch) 2023-10-12T06:56:48.579Z [container-agent] 2023-10-12 06:56:48 INFO juju.worker.uniter.operation runhook.go:186 ran "database-relation-changed" hook (via hook dispatching script: dispatch)
Seems like Mysql Charm was able to create endpoints fine. But failed on wait_service_ready.
When wait_service_ready fails, it raises a KubernetesClientError, which is interpreted as trust issues here
wait_service_ready
https://warthogs.atlassian.net/browse/DPE-2739
@gboutry , is this still being reported? We set 10 retries, one second each - wonder if is too little. It cab easily be increased though
Steps to reproduce
I can't reproduce every time, but it has happened to multiple sunbeam users
Expected behavior
If Mysql charm was able to create endpoint, no error about Juju trust
Actual behavior
Mysql fails to wait on primary endpoint, interpreted as Juju trust missing error
Versions
Operating system: Ubuntu 22.04.3 LTS
Juju CLI: 3.1.6-genericlinux-amd64
Juju agent: 3.1.6
Charm revision: 99 (8.0/stable)
microk8s: MicroK8s v1.28.0 revision 5802
Log output
Juju debug log:
Additional context
Seems like Mysql Charm was able to create endpoints fine. But failed on wait_service_ready.
When
wait_service_ready
fails, it raises a KubernetesClientError, which is interpreted as trust issues here