canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
104 stars 50 forks source link

katib-db-manager in error state with the message: hook failed: "update-status" #631

Closed kaskavel closed 11 months ago

kaskavel commented 1 year ago

Solutions QA team has a failed kubeflow run where katib-db-manager remains in an error state, because of a failed hook, "update-status"

From the logs (./3/baremetal/var/log/pods/kubeflow_katib-db-manager-0_c6b7e297-3909-4f53-84da-160f1e6aaef6/charm/0.log):

2023-07-05T01:08:20.516661754Z stdout F 2023-07-05T01:08:20.516Z [container-agent] 2023-07-05 01:08:20 ERROR juju-log Failed to handle <UpdateStatusEvent via KatibDBManagerOperator/on/update_status[16]> with error: Please add required database relation: eg. relational-db 2023-07-05T01:08:20.543796055Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status Error in sys.excepthook: 2023-07-05T01:08:20.543842197Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status Traceback (most recent call last): 2023-07-05T01:08:20.543853828Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/logging/init.py", line 954, in handle 2023-07-05T01:08:20.543868187Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status self.emit(record) 2023-07-05T01:08:20.543913825Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/log.py", line 41, in emit 2023-07-05T01:08:20.5439306Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status self.model_backend.juju_log(record.levelname, self.format(record)) 2023-07-05T01:08:20.543940024Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/logging/init.py", line 929, in format 2023-07-05T01:08:20.543947657Z stdout F 2023-07-05T01:08:20.543Z [container-agent] 2023-07-05 01:08:20 WARNING update-status return fmt.format(record) 2023-07-05T01:08:20.544377617Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/logging/init.py", line 676, in format 2023-07-05T01:08:20.544385937Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status record.exc_text = self.formatException(record.exc_info) 2023-07-05T01:08:20.544391148Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/logging/init.py", line 626, in formatException 2023-07-05T01:08:20.544396119Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status traceback.print_exception(ei[0], ei[1], tb, None, sio) 2023-07-05T01:08:20.544401205Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/traceback.py", line 103, in print_exception 2023-07-05T01:08:20.544406153Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status for line in TracebackException( 2023-07-05T01:08:20.544412571Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/traceback.py", line 617, in format 2023-07-05T01:08:20.54445088Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status yield from self.format_exception_only() 2023-07-05T01:08:20.544455454Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/usr/lib/python3.8/traceback.py", line 566, in format_exception_only 2023-07-05T01:08:20.544458663Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status stype = smod + '.' + stype 2023-07-05T01:08:20.54446185Z stdout F 2023-07-05T01:08:20.544Z [container-agent] 2023-07-05 01:08:20 WARNING update-status TypeError: unsupported operand type(s) for +: 'NoneType' and 'str' 2023-07-05T01:08:20.546452865Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status 2023-07-05T01:08:20.546471113Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status Original exception was: 2023-07-05T01:08:20.546522019Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status Traceback (most recent call last): 2023-07-05T01:08:20.546626311Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "./src/charm.py", line 366, in _refresh_status 2023-07-05T01:08:20.546649842Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status check = self._get_check_status() 2023-07-05T01:08:20.546895561Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "./src/charm.py", line 360, in _get_check_status 2023-07-05T01:08:20.546905462Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status return self.container.get_check("katib-db-manager-up").status 2023-07-05T01:08:20.546931279Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/model.py", line 1980, in get_check 2023-07-05T01:08:20.5469379Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status raise ModelError(f'check {check_name!r} not found') 2023-07-05T01:08:20.546942999Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status ops.model.ModelError: check 'katib-db-manager-up' not found 2023-07-05T01:08:20.546949741Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status 2023-07-05T01:08:20.547075721Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status The above exception was the direct cause of the following exception: 2023-07-05T01:08:20.547091294Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status 2023-07-05T01:08:20.547097935Z stdout F 2023-07-05T01:08:20.546Z [container-agent] 2023-07-05 01:08:20 WARNING update-status Traceback (most recent call last): 2023-07-05T01:08:20.547113749Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "./src/charm.py", line 430, in 2023-07-05T01:08:20.547120111Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status main(KatibDBManagerOperator) 2023-07-05T01:08:20.5471259Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 441, in main 2023-07-05T01:08:20.54726603Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status _emit_charm_event(charm, dispatcher.event_name) 2023-07-05T01:08:20.547286262Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 149, in _emit_charm_event 2023-07-05T01:08:20.547457377Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status event_to_emit.emit(*args, **kwargs) 2023-07-05T01:08:20.547964247Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 354, in emit 2023-07-05T01:08:20.547981318Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status framework._emit(event) 2023-07-05T01:08:20.547988421Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 830, in _emit 2023-07-05T01:08:20.547994586Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status self._reemit(event_path) 2023-07-05T01:08:20.548001398Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 919, in _reemit 2023-07-05T01:08:20.548008094Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status custom_handler(event) 2023-07-05T01:08:20.548014913Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "./src/charm.py", line 381, in _on_update_status 2023-07-05T01:08:20.548021303Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status self._refresh_status() 2023-07-05T01:08:20.548027211Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status File "./src/charm.py", line 368, in _refresh_status 2023-07-05T01:08:20.548041279Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status raise GenericCharmRuntimeError( 2023-07-05T01:08:20.548053735Z stdout F 2023-07-05T01:08:20.547Z [container-agent] 2023-07-05 01:08:20 WARNING update-status GenericCharmRuntimeError: Failed to run health check on workload container 2023-07-05T01:08:20.816308817Z stdout F 2023-07-05T01:08:20.816Z [container-agent] 2023-07-05 01:08:20 ERROR juju.worker.uniter.operation runhook.go:153 hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1 2023-07-05T01:08:20.816334817Z stdout F 2023-07-05T01:08:20.816Z [container-agent] 2023-07-05 01:08:20 DEBUG juju.machinelock machinelock.go:202 created rotating log file "/var/log/machine-lock.log" with max size 10 MB and max backups 5 2023-07-05T01:08:20.817070116Z stdout F 2023-07-05T01:08:20.816Z [container-agent] 2023-07-05 01:08:20 DEBUG juju.machinelock machinelock.go:186 machine lock released for katib-db-manager/0 uniter (run update-status hook) 2023-07-05T01:08:20.817101393Z stdout F 2023-07-05T01:08:20.816Z [container-agent] 2023-07-05 01:08:20 DEBUG juju.worker.uniter.operation executor.go:115 lock released for katib-db-manager/0 2023-07-05T01:08:20.818309113Z stdout F 2023-07-05T01:08:20.818Z [container-agent] 2023-07-05 01:08:20 INFO juju.worker.uniter resolver.go:155 awaiting error resolution for "update-status" hook 2023-07-05T01:08:20.818326717Z stdout F 2023-07-05T01:08:20.818Z [container-agent] 2023-07-05 01:08:20 DEBUG juju.worker.uniter agent.go:20 [AGENT-STATUS] error: hook failed: "update-status" 2023-07-05T01:08:25.934895748Z stdout F 2023-07-05T01:08:25.934Z [container-agent] 2023-07-05 01:08:25 DEBUG juju.worker.uniter.remotestate watcher.go:688 retry hook timer triggered for katib-db-manager/0 2023-07-05T01:08:25.934940454Z stdout F 2023-07-05T01:08:25.934Z [container-agent] 2023-07-05 01:08:25 INFO juju.worker.uniter resolver.go:155 awaiting error resolution for "update-status" hook 2023-07-05T01:08:25.934955638Z stdout F 2023-07-05T01:08:25.934Z [container-agent] 2023-07-05 01:08:25 DEBUG juju.worker.uniter.operation executor.go:85 running operation run update-status hook for katib-db-manager/0 2023-07-05T01:08:25.936060805Z stdout F 2023-07-05T01:08:25.935Z [container-agent] 2023-07-05 01:08:25 DEBUG juju.machinelock machinelock.go:162 acquire machine lock for katib-db-manager/0 uniter (run update-status hook) 2023-07-05T01:08:25.936084084Z stdout F 2023-07-05T01:08:25.935Z [container-agent] 2023-07-05 01:08:25 DEBUG juju.machinelock machinelock.go:172 machine lock acquired for katib-db-manager/0 uniter (run update-status hook) 2023-07-05T01:08:25.936105944Z stdout F 2023-07-05T01:08:25.935Z [container-agent] 2023-07-05 01:08:25 DEBUG juju.worker.uniter.operation executor.go:132 preparing operation "run update-status hook" for katib-db-manager/0

Logs: (try juju-crashdump-kubernetes-maas-2023-07-05-05.13.39.tar.gz)

ca-scribner commented 1 year ago

Pulled some relevant logs from juju-crashdump-kubeflow-2023-07-05.12.39.tar.gz/debug_log.txt. First are what I think are the really relevant logs, but at the bottom are a larger patch of logs. This all was pulled from about line 89850:

unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Error in sys.excepthook:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 954, in handle
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self.emit(record)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/log.py", line 41, in emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self.model_backend.juju_log(record.levelname, self.format(record))
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 929, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     return fmt.format(record)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 676, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     record.exc_text = self.formatException(record.exc_info)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 626, in formatException
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     traceback.print_exception(ei[0], ei[1], tb, None, sio)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 103, in print_exception
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     for line in TracebackException(
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 617, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     yield from self.format_exception_only()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 566, in format_exception_only
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     stype = smod + '.' + stype
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Original exception was:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 366, in _refresh_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     check = self._get_check_status()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 360, in _get_check_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     return self.container.get_check("katib-db-manager-up").status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/model.py", line 1980, in get_check
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     raise ModelError(f'check {check_name!r} not found')
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status ops.model.ModelError: check 'katib-db-manager-up' not found
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status The above exception was the direct cause of the following exception:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 430, in <module>
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     main(KatibDBManagerOperator)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 441, in main
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     _emit_charm_event(charm, dispatcher.event_name)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     event_to_emit.emit(*args, **kwargs)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 354, in emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     framework._emit(event)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 830, in _emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self._reemit(event_path)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 919, in _reemit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     custom_handler(event)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 381, in _on_update_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self._refresh_status()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 368, in _refresh_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     raise GenericCharmRuntimeError(
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status <unknown>GenericCharmRuntimeError: Failed to run health check on workload container

I think what this is saying is that the health check katib-db-manager-up wasn't found, resulting in the error. The pods.txt file shows katib-db-manager-0 with 2/2 containers up, so things should be running. We will need to look into this more.

@kaskavel do you know if anywhere in the log files we'd have the kubernetes container logs (eg: kubectl logs katib-db-manager-0 -c ____)?

(more logs, for completeness so its easy to look back)

unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.worker.uniter.remotestate retry hook timer triggered for katib-db-manager/0
unit-katib-db-manager-0: 2023-07-05 05:10:39 INFO juju.worker.uniter awaiting error resolution for "update-status" hook
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.worker.uniter.operation running operation run update-status hook for katib-db-manager/0
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.machinelock acquire machine lock for katib-db-manager/0 uniter (run update-status hook)
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.machinelock machine lock acquired for katib-db-manager/0 uniter (run update-status hook)
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.worker.uniter.operation preparing operation "run update-status hook" for katib-db-manager/0
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "juju-log" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG unit.jupyter-controller/0.juju-log Reading alert rule from /var/lib/juju/agents/unit-jupyter-controller-0/charm/src/prometheus_alert_rules/unit_unavailable.rule
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.worker.uniter.operation executing operation "run update-status hook" for katib-db-manager/0
unit-katib-db-manager-0: 2023-07-05 05:10:39 DEBUG juju.worker.uniter.runner starting jujuc server  {unix @/var/lib/juju/agents/unit-katib-db-manager-0/agent.socket <nil>}
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "juju-log" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG unit.jupyter-controller/0.juju-log `cos-tool` unavailable. Leaving expression unchanged: rate(workqueue_unfinished_work_seconds[5m]) >= 100
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "juju-log" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG unit.jupyter-controller/0.juju-log Reading alert rule from /var/lib/juju/agents/unit-jupyter-controller-0/charm/src/prometheus_alert_rules/controller.rule
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-get" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-delete" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:39 DEBUG jujuc running hook tool "state-get" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log Operator Framework 2.3.0 up and running.
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "config-get" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log Emitting Juju event update_status.
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "is-leader" for katib-db-manager/0-update-status-4813325974872136703
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-get" for jupyter-controller/0-update-status-2306377129954784490
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "status-set" for katib-db-manager/0-update-status-4813325974872136703
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-get" for jupyter-controller/0-update-status-2306377129954784490
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log load_ssl_context verify='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt' cert=None trust_env=True http2=False
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log load_verify_locations cafile='/var/run/secrets/kubernetes.io/serviceaccount/ca.crt'
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log connect_tcp.started host='10.152.183.1' port=443 local_address=None timeout=None socket_options=None
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-get" for jupyter-controller/0-update-status-2306377129954784490
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log connect_tcp.complete return_value=<httpcore.backends.sync.SyncStream object at 0x7f37cc7c1940>
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log start_tls.started ssl_context=<ssl.SSLContext object at 0x7f37cc7fa940> server_hostname='10.152.183.1' timeout=None
model-42a090b3-9416-4848-82a4-76760cc4901c: 2023-07-05 05:10:40 DEBUG juju.worker.caasadmission received admission request for a33bd623.machinelearning.seldon.io of /v1, Kind=ConfigMap in namespace kubeflow
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log start_tls.complete return_value=<httpcore.backends.sync.SyncStream object at 0x7f37cc7c1850>
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.started request=<Request [b'GET']>
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
model-42a090b3-9416-4848-82a4-76760cc4901c: 2023-07-05 05:10:40 DEBUG juju.worker.caasadmission received admission request for a33bd623.machinelearning.seldon.io of coordination.k8s.io/v1, Kind=Lease in namespace kubeflow
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.complete
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log send_request_body.started request=<Request [b'GET']>
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log send_request_body.complete
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.started request=<Request [b'GET']>
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-delete" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "state-set" for jupyter-controller/0-update-status-2306377129954784490
application-jupyter-controller: 2023-07-05 05:10:40 INFO juju.worker.caasoperator.uniter.jupyter-controller/0.operation ran "update-status" hook (via hook dispatching script: dispatch)
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'e9ea3bc9-8cfc-4b79-a8e7-e001f9ef8dcd'), (b'Cache-Control', b'no-cache, private'), (b'Content-Encoding', b'gzip'), (b'Content-Type', b'application/json'), (b'Vary', b'Accept-Encoding'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'f4790712-73c7-4a08-8d25-7c2012cec096'), (b'X-Kubernetes-Pf-Prioritylevel-Uid', b'51c0ae49-a063-4f2e-aa4e-866eafd3abc9'), (b'Date', b'Wed, 05 Jul 2023 05:10:40 GMT'), (b'Transfer-Encoding', b'chunked')])
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 INFO unit.katib-db-manager/0.juju-log HTTP Request: GET https://10.152.183.1/apis/apiextensions.k8s.io/v1/customresourcedefinitions "HTTP/1.1 200 OK"
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.started request=<Request [b'GET']>
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG juju.worker.caasoperator.uniter.jupyter-controller/0.operation committing operation "run update-status hook" for jupyter-controller/0
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.complete
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG juju.machinelock created rotating log file "/var/log/juju/machine-lock.log" with max size 10 MB and max backups 5
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG juju.machinelock machine lock released for jupyter-controller/0 uniter (run update-status hook)
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG juju.worker.caasoperator.uniter.jupyter-controller/0.operation lock released for jupyter-controller/0
application-jupyter-controller: 2023-07-05 05:10:40 DEBUG juju.worker.caasoperator.uniter.jupyter-controller/0 no operations in progress; waiting for changes
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log response_closed.started
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:40 DEBUG unit.katib-db-manager/0.juju-log response_closed.complete
model-42a090b3-9416-4848-82a4-76760cc4901c: 2023-07-05 05:10:41 DEBUG juju.worker.caasadmission received admission request for workflow-controller of coordination.k8s.io/v1, Kind=Lease in namespace kubeflow
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 INFO unit.katib-db-manager/0.juju-log Rendering manifests
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log Rendering with context: {'app_name': 'katib-db-manager', 'namespace': 'kubeflow'}
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log Rendering manifest for src/templates/auth_manifests.yaml.j2
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log Rendered manifest:
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: katib-db-manager
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  - namespaces
  verbs:
  - "*"
- apiGroups:
  - kubeflow.org
  resources:
  - experiments
  - trials
  - suggestions
  verbs:
  - "*"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: katib-db-manager
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: katib-db-manager
subjects:
- kind: ServiceAccount
  name: katib-db-manager
  namespace: kubeflow
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log Applying 2 resources
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_body.started request=<Request [b'PATCH']>
model-42a090b3-9416-4848-82a4-76760cc4901c: 2023-07-05 05:10:41 DEBUG juju.worker.caasadmission received admission request for katib-db-manager of rbac.authorization.k8s.io/v1, Kind=ClusterRole in namespace 
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_body.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'60caee22-15ce-4d58-9a84-1e3ad8f923f0'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'f4790712-73c7-4a08-8d25-7c2012cec096'), (b'X-Kubernetes-Pf-Prioritylevel-Uid', b'51c0ae49-a063-4f2e-aa4e-866eafd3abc9'), (b'Date', b'Wed, 05 Jul 2023 05:10:41 GMT'), (b'Content-Length', b'640')])
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 INFO unit.katib-db-manager/0.juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterroles/katib-db-manager?fieldManager=lightkube "HTTP/1.1 200 OK"
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log response_closed.started
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log response_closed.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_headers.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_body.started request=<Request [b'PATCH']>
model-42a090b3-9416-4848-82a4-76760cc4901c: 2023-07-05 05:10:41 DEBUG juju.worker.caasadmission received admission request for katib-db-manager of rbac.authorization.k8s.io/v1, Kind=ClusterRoleBinding in namespace 
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log send_request_body.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Audit-Id', b'0736e2d3-58be-4e60-810d-f69cbbaab3be'), (b'Cache-Control', b'no-cache, private'), (b'Content-Type', b'application/json'), (b'X-Kubernetes-Pf-Flowschema-Uid', b'f4790712-73c7-4a08-8d25-7c2012cec096'), (b'X-Kubernetes-Pf-Prioritylevel-Uid', b'51c0ae49-a063-4f2e-aa4e-866eafd3abc9'), (b'Date', b'Wed, 05 Jul 2023 05:10:41 GMT'), (b'Content-Length', b'672')])
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 INFO unit.katib-db-manager/0.juju-log HTTP Request: PATCH https://10.152.183.1/apis/rbac.authorization.k8s.io/v1/clusterrolebindings/katib-db-manager?fieldManager=lightkube "HTTP/1.1 200 OK"
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.started request=<Request [b'PATCH']>
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log receive_response_body.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log response_closed.started
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG unit.katib-db-manager/0.juju-log response_closed.complete
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 INFO unit.katib-db-manager/0.juju-log Reconcile completed successfully
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "status-set" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "relation-ids" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "relation-ids" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "status-set" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 DEBUG jujuc running hook tool "juju-log" for katib-db-manager/0-update-status-4813325974872136703
unit-katib-db-manager-0: 2023-07-05 05:10:41 ERROR unit.katib-db-manager/0.juju-log Failed to handle <UpdateStatusEvent via KatibDBManagerOperator/on/update_status[16]> with error: Please add required database relation: eg. relational-db
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Error in sys.excepthook:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 954, in handle
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self.emit(record)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/log.py", line 41, in emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self.model_backend.juju_log(record.levelname, self.format(record))
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 929, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     return fmt.format(record)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 676, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     record.exc_text = self.formatException(record.exc_info)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/logging/__init__.py", line 626, in formatException
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     traceback.print_exception(ei[0], ei[1], tb, None, sio)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 103, in print_exception
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     for line in TracebackException(
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 617, in format
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     yield from self.format_exception_only()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/usr/lib/python3.8/traceback.py", line 566, in format_exception_only
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     stype = smod + '.' + stype
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Original exception was:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 366, in _refresh_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     check = self._get_check_status()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 360, in _get_check_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     return self.container.get_check("katib-db-manager-up").status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/model.py", line 1980, in get_check
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     raise ModelError(f'check {check_name!r} not found')
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status ops.model.ModelError: check 'katib-db-manager-up' not found
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status The above exception was the direct cause of the following exception:
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status 
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status Traceback (most recent call last):
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 430, in <module>
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     main(KatibDBManagerOperator)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 441, in main
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     _emit_charm_event(charm, dispatcher.event_name)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/main.py", line 149, in _emit_charm_event
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     event_to_emit.emit(*args, **kwargs)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 354, in emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     framework._emit(event)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 830, in _emit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self._reemit(event_path)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "/var/lib/juju/agents/unit-katib-db-manager-0/charm/venv/ops/framework.py", line 919, in _reemit
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     custom_handler(event)
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 381, in _on_update_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     self._refresh_status()
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status   File "./src/charm.py", line 368, in _refresh_status
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status     raise GenericCharmRuntimeError(
unit-katib-db-manager-0: 2023-07-05 05:10:41 WARNING unit.katib-db-manager/0.update-status <unknown>GenericCharmRuntimeError: Failed to run health check on workload container
unit-katib-db-manager-0: 2023-07-05 05:10:41 ERROR juju.worker.uniter.operation hook "update-status" (via hook dispatching script: dispatch) failed: exit status 1
ca-scribner commented 1 year ago

I'm not sure the attached debug-logs.txt is complete. I don't see where katib-db-manager is deployed - the first I see is it saying its stuck in error state on the update-status hook

A wild guess is that maybe deployment took a while, and update-status fired before pebble-ready and so the checks don't exist?

kaskavel commented 1 year ago

@ca-scribner thanks for the response. Unfortunately we are not collecting container logs. If that's a must for debugging, we can start doing this though.

ca-scribner commented 1 year ago

If its not a problem, that would be great to add!

orfeas-k commented 1 year ago

Taking this over, where was this deployed? @kaskavel If it was on Charmed Kubernets or EKS, there is this known issue with mysql-k8s-operator which has been fixed but still not published to 8.0/stable version (you can view revisions published here). Could you please confirm that deploying 1.7/edge (which uses mysql-k8s edge channel) actually solves this issue for you?

kaskavel commented 1 year ago

Taking this over, where was this deployed? @kaskavel If it was on Charmed Kubernets or EKS, there is this known issue with mysql-k8s-operator which has been fixed but still not published to 8.0/stable version (you can view revisions published here). Could you please confirm that deploying 1.7/edge (which uses mysql-k8s edge channel) actually solves this issue for you?

We have a new occurrence with charmed-k8s on EC2. mysql-k8s is indeed on 8.0/stable. We will switch to 1.7/edge and let you know, thanks.

orfeas-k commented 1 year ago

Yes please do @kaskavel. We 're also pushing for this change to be released in 8.0/stable so we using edge won't be needed.

i-chvets commented 1 year ago

@kaskavel Was this issuer resolved for you with wortkarounf?

NohaIhab commented 11 months ago

the fix in mysql-k8s-operator was released to 8.0/stable, this can be closed now.

NohaIhab commented 11 months ago

the fix in mysql-k8s-operator was released to 8.0/stable, this can be closed now.