Open ACodingfreak opened 2 days ago
Thank you for reporting us your feedback!
The internal ticket has been created: https://warthogs.atlassian.net/browse/KF-5950.
This message was autogenerated
Hi @ACodingfreak thanks for reporting this!
Please note there is a katib-db-manager
and a katib-db
charm (which underneath is just the mysql-k8s
charm). The katib-db-manager
depends on the katib-db
charm to be active, idle and serving; otherwise it will just go into a waiting status.
From the juju status
output you have provided I can see:
katib-db waiting 1 mysql-k8s 8.0/stable 153 10.152.183.57 no installing agent
...
katib-db/0* unknown idle 10.1.69.130
As you mentioned, it is stuck in waiting
status with installing agent
message. Unfortunately, the logs you have provided come from the katib-db-manager
charm, which doesn't seem to be the one causing issues.
Pinging @paulomach, @shayancanonical - have you folks run into this? what other logs could be useful for debugging this issue?
my guess is that there is some sort of pebble issue causing mysqld to not start up, followed by errors connecting to the mysqld service. would you have the katib-db
container still running? if so, would you be able to provide the output of pebble services
as well as the content of /var/log/mysql/error.log
and any logs from /var/log/mysql/archive_error/*.log
?
relevant error traces from the mysql katib-db
container logs.
the following error occurs numerous times first:
2024-07-01T16:44:14.625Z [container-agent] 2024-07-01 16:44:14 ERROR juju-log Uncaught exception while in charm code:
2024-07-01T16:44:14.625Z [container-agent] Traceback (most recent call last):
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 839, in <module>
2024-07-01T16:44:14.625Z [container-agent] main(MySQLOperatorCharm)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 548, in main
2024-07-01T16:44:14.625Z [container-agent] manager.run()
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 527, in run
2024-07-01T16:44:14.625Z [container-agent] self._emit()
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 516, in _emit
2024-07-01T16:44:14.625Z [container-agent] _emit_charm_event(self.charm, self.dispatcher.event_name)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
2024-07-01T16:44:14.625Z [container-agent] event_to_emit.emit(*args, **kwargs)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 348, in emit
2024-07-01T16:44:14.625Z [container-agent] framework._emit(event)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 860, in _emit
2024-07-01T16:44:14.625Z [container-agent] self._reemit(event_path)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 950, in _reemit
2024-07-01T16:44:14.625Z [container-agent] custom_handler(event)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:44:14.625Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 648, in _on_mysql_pebble_ready
2024-07-01T16:44:14.625Z [container-agent] self._configure_instance(container)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:44:14.625Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 574, in _configure_instance
2024-07-01T16:44:14.625Z [container-agent] container.restart(MYSQLD_SAFE_SERVICE)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/model.py", line 2226, in restart
2024-07-01T16:44:14.625Z [container-agent] self._pebble.restart_services(service_names)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 2065, in restart_services
2024-07-01T16:44:14.625Z [container-agent] return self._services_action('restart', services, timeout, delay)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 2085, in _services_action
2024-07-01T16:44:14.625Z [container-agent] resp = self._request('POST', '/v1/services', body=body)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1859, in _request
2024-07-01T16:44:14.625Z [container-agent] response = self._request_raw(method, path, query, headers, data)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1898, in _request_raw
2024-07-01T16:44:14.625Z [container-agent] response = self.opener.open(request, timeout=self.timeout)
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 519, in open
2024-07-01T16:44:14.625Z [container-agent] response = self._open(req, data)
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
2024-07-01T16:44:14.625Z [container-agent] result = self._call_chain(self.handle_open, protocol, protocol +
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
2024-07-01T16:44:14.625Z [container-agent] result = func(*args)
2024-07-01T16:44:14.625Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 373, in http_open
2024-07-01T16:44:14.625Z [container-agent] return self.do_open(
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/urllib/request.py", line 1352, in do_open
2024-07-01T16:44:14.625Z [container-agent] r = h.getresponse()
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 1375, in getresponse
2024-07-01T16:44:14.625Z [container-agent] response.begin()
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 318, in begin
2024-07-01T16:44:14.625Z [container-agent] version, status, reason = self._read_status()
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/http/client.py", line 279, in _read_status
2024-07-01T16:44:14.625Z [container-agent] line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
2024-07-01T16:44:14.625Z [container-agent] File "/usr/lib/python3.10/socket.py", line 705, in readinto
2024-07-01T16:44:14.625Z [container-agent] return self._sock.recv_into(b)
2024-07-01T16:44:14.625Z [container-agent] TimeoutError: timed out
after a while, the following error repeats in the logs
2024-07-01T16:51:12.515Z [container-agent] 2024-07-01 16:51:12 INFO juju-log Adding pebble layer
2024-07-01T16:51:13.639Z [container-agent] 2024-07-01 16:51:13 ERROR juju-log Failed to connect to MySQL with mysqlsh
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 602, in _run_mysqlsh_script
2024-07-01T16:51:13.639Z [container-agent] stdout, _ = process.wait_output()
2024-07-01T16:51:13.639Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/pebble.py", line 1635, in wait_output
2024-07-01T16:51:13.639Z [container-agent] raise ExecError[AnyStr](self._command, exit_code, out_value, err_value)
2024-07-01T16:51:13.639Z [container-agent] ops.pebble.ExecError: non-zero exit code 1 executing ['/usr/bin/mysqlsh', '--no-wizard', '--python', '--verbose=1', '-f', '/tmp/script.py', ';', 'rm', '/tmp/script.py'], stdout='', stderr='Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory\nverbose: 2024-07-01T16:51:13Z: Loading startup files...\nverbose: 2024-07-01T16:51:13Z: Loading plugins...\nverbose: 2024-07-01T16:51:13Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\nTraceback (most recent call last):\n File "<string>", line 1, in <module>\nmysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user \'serverconfig\'@\'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local\' (using password: YES)\n'
2024-07-01T16:51:13.639Z [container-agent]
2024-07-01T16:51:13.639Z [container-agent] During handling of the above exception, another exception occurred:
2024-07-01T16:51:13.639Z [container-agent]
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/mysql/v0/mysql.py", line 3108, in check_mysqlsh_connection
2024-07-01T16:51:13.639Z [container-agent] self._run_mysqlsh_script("\n".join(connect_commands))
2024-07-01T16:51:13.639Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:13.639Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:51:13.639Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 605, in _run_mysqlsh_script
2024-07-01T16:51:13.639Z [container-agent] raise MySQLClientError(e.stderr)
2024-07-01T16:51:13.639Z [container-agent] charms.mysql.v0.mysql.MySQLClientError: Cannot set LC_ALL to locale en_US.UTF-8: No such file or directory
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Loading startup files...
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Loading plugins...
2024-07-01T16:51:13.639Z [container-agent] verbose: 2024-07-01T16:51:13Z: Connecting to MySQL at: serverconfig@katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local
2024-07-01T16:51:13.639Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:13.639Z [container-agent] File "<string>", line 1, in <module>
2024-07-01T16:51:13.639Z [container-agent] mysqlsh.DBError: MySQL Error (1045): Shell.connect: Access denied for user 'serverconfig'@'katib-db-0.katib-db-endpoints.kubeflow.svc.cluster.local' (using password: YES)
2024-07-01T16:51:46.858Z [container-agent]
2024-07-01T16:51:46.916Z [container-agent] 2024-07-01 16:51:46 ERROR juju-log Uncaught exception while in charm code:
2024-07-01T16:51:46.916Z [container-agent] Traceback (most recent call last):
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 839, in <module>
2024-07-01T16:51:46.916Z [container-agent] main(MySQLOperatorCharm)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 548, in main
2024-07-01T16:51:46.916Z [container-agent] manager.run()
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 527, in run
2024-07-01T16:51:46.916Z [container-agent] self._emit()
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 516, in _emit
2024-07-01T16:51:46.916Z [container-agent] _emit_charm_event(self.charm, self.dispatcher.event_name)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/main.py", line 147, in _emit_charm_event
2024-07-01T16:51:46.916Z [container-agent] event_to_emit.emit(*args, **kwargs)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 348, in emit
2024-07-01T16:51:46.916Z [container-agent] framework._emit(event)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 860, in _emit
2024-07-01T16:51:46.916Z [container-agent] self._reemit(event_path)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/ops/framework.py", line 950, in _reemit
2024-07-01T16:51:46.916Z [container-agent] custom_handler(event)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 642, in _on_mysql_pebble_ready
2024-07-01T16:51:46.916Z [container-agent] self._reconcile_pebble_layer(container)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/./src/charm.py", line 395, in _reconcile_pebble_layer
2024-07-01T16:51:46.916Z [container-agent] self._mysql.wait_until_mysql_connection()
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/lib/charms/tempo_k8s/v1/charm_tracing.py", line 544, in wrapped_function
2024-07-01T16:51:46.916Z [container-agent] return callable(*args, **kwargs) # type: ignore
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 330, in wrapped_f
2024-07-01T16:51:46.916Z [container-agent] return self(f, *args, **kw)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 467, in __call__
2024-07-01T16:51:46.916Z [container-agent] do = self.iter(retry_state=retry_state)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 368, in iter
2024-07-01T16:51:46.916Z [container-agent] result = action(retry_state)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 410, in exc_check
2024-07-01T16:51:46.916Z [container-agent] raise retry_exc.reraise()
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 183, in reraise
2024-07-01T16:51:46.916Z [container-agent] raise self.last_attempt.result()
2024-07-01T16:51:46.916Z [container-agent] File "/usr/lib/python3.10/concurrent/futures/_base.py", line 451, in result
2024-07-01T16:51:46.916Z [container-agent] return self.__get_result()
2024-07-01T16:51:46.916Z [container-agent] File "/usr/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
2024-07-01T16:51:46.916Z [container-agent] raise self._exception
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/venv/tenacity/__init__.py", line 470, in __call__
2024-07-01T16:51:46.916Z [container-agent] result = fn(*args, **kwargs)
2024-07-01T16:51:46.916Z [container-agent] File "/var/lib/juju/agents/unit-katib-db-0/charm/src/mysql_k8s_helpers.py", line 232, in wait_until_mysql_connection
2024-07-01T16:51:46.916Z [container-agent] raise MySQLServiceNotRunningError("Connection with mysqlsh not possible")
2024-07-01T16:51:46.916Z [container-agent] charms.mysql.v0.mysql.MySQLServiceNotRunningError: Connection with mysqlsh not possible
Hi All,
I did attach logs from katib-db container in the bug named as mysql.zip. Is this good enough ?
https://github.com/user-attachments/files/16057534/mysql.zip
Sorry to say but as I need to quickly bring up CKF, I ended up downgrading the setup into microk8s:1.24 juju:2.9 kubeflow:1.7 Even there I am facing a different katib-manager issue as shown below.
https://github.com/canonical/bundle-kubeflow/issues/963
Like microk8s inspect is there a command for juju to dump all the logs needed for troubleshooting ?
Unfortunately, we dont yet have a tool similar to microk8s inspect
which will dump all the logs required for troubleshooting the mysql charm - but I believe we have something similar in our backlog.
Were you able to get CKF running? If not, would you be able to provide us with the environment details where you're deploying CKF so we can reproduce the issue?
@shayancanonical I think they did:
sudo snap install microk8s --channel=1.29/stable --classic
sudo snap install juju --classic --channel=3.4/stable
microk8s config | juju add-k8s my-k8s --client
juju bootstrap my-k8s uk8sx
juju add-model kubeflow
juju deploy kubeflow --trust --channel=1.8/stable
Bug Description
As shown in below logs, Katib-db-manager is continously waiting for the relation-db data from mysql which is still busy in installing the Agent
To Reproduce
Environment
Relevant Log Output
Logs from mysql katib-db container
mysql.zip
Additional Context
No response