canonical / bundle-kubeflow

Charmed Kubeflow
Apache License 2.0
98 stars 48 forks source link

Pods istio-ingressgateway-0 and istio-pilot-0 stopped working #532

Closed gpotti closed 1 year ago

gpotti commented 1 year ago

Hello,

A very Happy New Year to you all in advance!

I'm very new to MLOps world and I started learning Kubeflow a few weeks back. I'm on charmed kubeflow-lite on microk8s 1.22, installed it on Ubuntu 20.04 laptop following these instructions https://charmed-kubeflow.io/docs/quickstart

I'm facing an issue similar to #509 and #529. The istio-pilot-o and istio-ingressgateway-o pods stopped working. They show 0/1 pods running.

microk8s.kubectl get pods --all-namespaces|grep istio kubeflow istiod-588f95c45f-28xns 1/1 Running 3 (70m ago) 179m kubeflow istio-ingressgateway-workload-89ff56576-kz9xf 1/1 Running 2 (70m ago) 168m kubeflow istio-pilot-0 0/1 Running 3 (70m ago) 3h2m kubeflow istio-ingressgateway-0 0/1 Running 3 (70m ago) 3h2m

Pasting the logs from the pods below and juju status output

$microk8s.kubectl logs istio-pilot-0 -n kubeflow|more 2022-12-29T08:58:05.858Z [pebble] HTTP API server listening on ":38812". 2022-12-29T08:58:05.858Z [pebble] Started daemon. 2022-12-29T08:58:05.887Z [pebble] POST /v1/services 28.906882ms 202 2022-12-29T08:58:05.887Z [pebble] Started default services with change 4. 2022-12-29T08:58:05.915Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.cmd supercommand.go:56 running containerAgent [2.9.37 fd867c0a267591313571dee9c60f3f9e71120581 gc go1.19.3] 2022-12-29T08:58:05.957Z [container-agent] starting containeragent unit command 2022-12-29T08:58:05.957Z [container-agent] containeragent unit "unit-istio-pilot-0" start (2.9.37 [gc]) 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.cmd.containeragent.unit runner.go:556 start "unit" 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 2.9.37 have already been run. 2022-12-29T08:58:05.958Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.probehttpserver server.go:157 starting http server on [::]:65301 2022-12-29T08:58:05.966Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:688 connection established to "wss://controller-service.controller-microk8s-localhost.svc.cluster.local:17070/mod el/2be3747a-7a6d-482a-85b4-eac8e14330f7/api" 2022-12-29T08:58:05.968Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.apicaller connect.go:163 [2be374] "unit-istio-pilot-0" successfully connected to "controller-service.controller-microk8s-lo calhost.svc.cluster.local:17070" 2022-12-29T08:58:05.986Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-microk8s-localhost.svc.cluster.local": lookup controller-servi ce.controller-microk8s-localhost.svc.cluster.local: operation was canceled 2022-12-29T08:58:05.986Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:688 connection established to "wss://10.152.183.172:17070/model/2be3747a-7a6d-482a-85b4-eac8e14330f7/api" 2022-12-29T08:58:05.989Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.apicaller connect.go:163 [2be374] "unit-istio-pilot-0" successfully connected to "10.152.183.172:17070" 2022-12-29T08:58:06.009Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE 2022-12-29T08:58:06.013Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.logger logger.go:120 logger worker started 2022-12-29T08:58:06.015Z [container-agent] 2022-12-29 08:58:06 WARNING juju.worker.proxyupdater proxyupdater.go:282 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: "" 2022-12-29T08:58:06.026Z [container-agent] 2022-12-29 08:58:06 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-istio-pilot-0 2022-12-29T08:58:06.031Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.leadership tracker.go:194 istio-pilot/0 promoted to leadership of istio-pilot 2022-12-29T08:58:06.102Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received 2022-12-29T08:58:06.103Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check 2022-12-29T08:58:06.301Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter uniter.go:326 unit "istio-pilot/0" started 2022-12-29T08:58:06.304Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter uniter.go:344 hooks are retried true 2022-12-29T08:58:06.389Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter resolver.go:76 reboot detected; triggering implicit start hook to notify charm 2022-12-29T08:58:07.657Z [container-agent] 2022-12-29 08:58:07 INFO juju-log Running legacy hooks/start. 2022-12-29T08:58:15.863Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418


2022-12-29T09:09:05.863Z [pebble] Check "readiness" failure 66 (threshold 3): received non-20x status code 418 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29 09:09:12 ERROR juju-log Uncaught exception while in charm code: 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 703, in urlopen 2022-12-29T09:09:12.048Z [container-agent] httplib_response = self._make_request( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 386, in _make_request 2022-12-29T09:09:12.048Z [container-agent] self._validate_conn(conn) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 1042, in _validate_conn 2022-12-29T09:09:12.048Z [container-agent] conn.connect() 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connection.py", line 414, in connect 2022-12-29T09:09:12.048Z [container-agent] self.sock = ssl_wrapsocket( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/ssl.py", line 449, in ssl_wrap_socket 2022-12-29T09:09:12.048Z [container-agent] ssl_sock = _ssl_wrap_socketimpl( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/ssl.py", line 493, in _ssl_wrap_socket_impl 2022-12-29T09:09:12.048Z [container-agent] return ssl_context.wrap_socket(sock, server_hostname=server_hostname) 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket 2022-12-29T09:09:12.048Z [container-agent] return self.sslsocket_class._create( 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 1040, in _create 2022-12-29T09:09:12.048Z [container-agent] self.do_handshake() 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake 2022-12-29T09:09:12.048Z [container-agent] self._sslobj.do_handshake() 2022-12-29T09:09:12.048Z [container-agent] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1131) 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] During handling of the above exception, another exception occurred: 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/adapters.py", line 439, in send 2022-12-29T09:09:12.048Z [container-agent] resp = conn.urlopen( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 787, in urlopen 2022-12-29T09:09:12.048Z [container-agent] retries = retries.increment( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/retry.py", line 592, in increment 2022-12-29T09:09:12.048Z [container-agent] raise MaxRetryError(_pool, url, error or ResponseError(cause)) 2022-12-29T09:09:12.048Z [container-agent] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /canonical/operator-schemas/ma ster/k8s-service.yaml (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))) 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] During handling of the above exception, another exception occurred: 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "./src/charm.py", line 333, in 2022-12-29T09:09:12.048Z [container-agent] main(Operator) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 414, in main 2022-12-29T09:09:12.048Z [container-agent] charm = charm_class(framework) 2022-12-29T09:09:12.048Z [container-agent] File "./src/charm.py", line 28, in init 2022-12-29T09:09:12.048Z [container-agent] self.interfaces = get_interfaces(self) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/init.py", line 249, in get_interfaces 2022-12-29T09:09:12.048Z [container-agent] provides = { 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/init.py", line 253, in 2022-12-29T09:09:12.048Z [container-agent] get_schema(interface["schema"]), 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/utils.py", line 57, in get_schema 2022-12-29T09:09:12.048Z [container-agent] response = _get_schema_response_from_remote(schema) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/utils.py", line 83, in _get_schema_response_from_remote 2022-12-29T09:09:12.048Z [container-agent] response = requests.get(url=url, proxies=proxies) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/api.py", line 76, in get 2022-12-29T09:09:12.048Z [container-agent] return request('get', url, params=params, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/api.py", line 61, in request 2022-12-29T09:09:12.048Z [container-agent] return session.request(method=method, url=url, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/sessions.py", line 542, in request 2022-12-29T09:09:12.048Z [container-agent] resp = self.send(prep, send_kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/sessions.py", line 655, in send 2022-12-29T09:09:12.048Z [container-agent] r = adapter.send(request, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/adapters.py", line 514, in send 2022-12-29T09:09:12.048Z [container-agent] raise SSLError(e, request=request) 2022-12-29T09:09:12.048Z [container-agent] requests.exceptions.SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /canonical/operator-schemas/master /k8s-service.yaml (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))) 2022-12-29T09:09:12.289Z [container-agent] 2022-12-29 09:09:12 ERROR juju.worker.uniter.operation runhook.go:140 hook "start" (via hook dispatching script: dispatch) failed: exit status 1 2022-12-29T09:09:12.291Z [container-agent] 2022-12-29 09:09:12 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "start" hook 2022-12-29T09:09:15.862Z [pebble] Check "readiness" failure 67 (threshold 3): received non-20x status code 418 2022-12-29T09:09:17.310Z [container-agent] 2022-12-29 09:09:17 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "start" hook 2022-12-29T09:09:17.589Z [container-agent] 2022-12-29 09:09:17 INFO juju-log Running legacy hooks/start. 2022-12-29T09:09:25.862Z [pebble] Check "readiness" failure 68 (threshold 3): received non-20x status code 418


$ microk8s.kubectl logs istio-pilot-0 -n kubeflow |more 2022-12-29T08:58:05.858Z [pebble] HTTP API server listening on ":38812". 2022-12-29T08:58:05.858Z [pebble] Started daemon. 2022-12-29T08:58:05.887Z [pebble] POST /v1/services 28.906882ms 202 2022-12-29T08:58:05.887Z [pebble] Started default services with change 4. 2022-12-29T08:58:05.915Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.cmd supercommand.go:56 running containerAgent [2.9.37 fd867c0a267591313571dee9c60f3f9e71120581 gc go1.19.3] 2022-12-29T08:58:05.957Z [container-agent] starting containeragent unit command 2022-12-29T08:58:05.957Z [container-agent] containeragent unit "unit-istio-pilot-0" start (2.9.37 [gc]) 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.cmd.containeragent.unit runner.go:556 start "unit" 2022-12-29T08:58:05.957Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 2.9.37 have already been run. 2022-12-29T08:58:05.958Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.probehttpserver server.go:157 starting http server on [::]:65301 2022-12-29T08:58:05.966Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:688 connection established to "wss://controller-service.controller-microk8s-localhost.svc.cluster.local:17070/mod el/2be3747a-7a6d-482a-85b4-eac8e14330f7/api" 2022-12-29T08:58:05.968Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.apicaller connect.go:163 [2be374] "unit-istio-pilot-0" successfully connected to "controller-service.controller-microk8s-lo calhost.svc.cluster.local:17070" 2022-12-29T08:58:05.986Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:1055 cannot resolve "controller-service.controller-microk8s-localhost.svc.cluster.local": lookup controller-servi ce.controller-microk8s-localhost.svc.cluster.local: operation was canceled 2022-12-29T08:58:05.986Z [container-agent] 2022-12-29 08:58:05 INFO juju.api apiclient.go:688 connection established to "wss://10.152.183.172:17070/model/2be3747a-7a6d-482a-85b4-eac8e14330f7/api" 2022-12-29T08:58:05.989Z [container-agent] 2022-12-29 08:58:05 INFO juju.worker.apicaller connect.go:163 [2be374] "unit-istio-pilot-0" successfully connected to "10.152.183.172:17070" 2022-12-29T08:58:06.009Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE 2022-12-29T08:58:06.013Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.logger logger.go:120 logger worker started 2022-12-29T08:58:06.015Z [container-agent] 2022-12-29 08:58:06 WARNING juju.worker.proxyupdater proxyupdater.go:282 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: "" 2022-12-29T08:58:06.026Z [container-agent] 2022-12-29 08:58:06 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-istio-pilot-0 2022-12-29T08:58:06.031Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.leadership tracker.go:194 istio-pilot/0 promoted to leadership of istio-pilot 2022-12-29T08:58:06.102Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received 2022-12-29T08:58:06.103Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check 2022-12-29T08:58:06.301Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter uniter.go:326 unit "istio-pilot/0" started 2022-12-29T08:58:06.304Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter uniter.go:344 hooks are retried true 2022-12-29T08:58:06.389Z [container-agent] 2022-12-29 08:58:06 INFO juju.worker.uniter resolver.go:76 reboot detected; triggering implicit start hook to notify charm 2022-12-29T08:58:07.657Z [container-agent] 2022-12-29 08:58:07 INFO juju-log Running legacy hooks/start. 2022-12-29T08:58:15.863Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418 2022-12-29T08:58:25.865Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418


2022-12-29T09:09:12.048Z [container-agent] 2022-12-29 09:09:12 ERROR juju-log Uncaught exception while in charm code: 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 703, in urlopen 2022-12-29T09:09:12.048Z [container-agent] httplib_response = self._make_request( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 386, in _make_request 2022-12-29T09:09:12.048Z [container-agent] self._validate_conn(conn) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 1042, in _validate_conn 2022-12-29T09:09:12.048Z [container-agent] conn.connect() 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connection.py", line 414, in connect 2022-12-29T09:09:12.048Z [container-agent] self.sock = ssl_wrapsocket( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/ssl.py", line 449, in ssl_wrap_socket 2022-12-29T09:09:12.048Z [container-agent] ssl_sock = _ssl_wrap_socketimpl( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/ssl.py", line 493, in _ssl_wrap_socket_impl 2022-12-29T09:09:12.048Z [container-agent] return ssl_context.wrap_socket(sock, server_hostname=server_hostname) 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 500, in wrap_socket 2022-12-29T09:09:12.048Z [container-agent] return self.sslsocket_class._create( 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 1040, in _create 2022-12-29T09:09:12.048Z [container-agent] self.do_handshake() 2022-12-29T09:09:12.048Z [container-agent] File "/usr/lib/python3.8/ssl.py", line 1309, in do_handshake 2022-12-29T09:09:12.048Z [container-agent] self._sslobj.do_handshake() 2022-12-29T09:09:12.048Z [container-agent] ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1131) 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] During handling of the above exception, another exception occurred: 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/adapters.py", line 439, in send 2022-12-29T09:09:12.048Z [container-agent] resp = conn.urlopen( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/connectionpool.py", line 787, in urlopen 2022-12-29T09:09:12.048Z [container-agent] retries = retries.increment( 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/urllib3/util/retry.py", line 592, in increment 2022-12-29T09:09:12.048Z [container-agent] raise MaxRetryError(_pool, url, error or ResponseError(cause)) 2022-12-29T09:09:12.048Z [container-agent] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /canonical/operator-schemas/ma ster/k8s-service.yaml (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))) 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] During handling of the above exception, another exception occurred: 2022-12-29T09:09:12.048Z [container-agent] 2022-12-29T09:09:12.048Z [container-agent] Traceback (most recent call last): 2022-12-29T09:09:12.048Z [container-agent] File "./src/charm.py", line 333, in 2022-12-29T09:09:12.048Z [container-agent] main(Operator) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/ops/main.py", line 414, in main 2022-12-29T09:09:12.048Z [container-agent] charm = charm_class(framework) 2022-12-29T09:09:12.048Z [container-agent] File "./src/charm.py", line 28, in init 2022-12-29T09:09:12.048Z [container-agent] self.interfaces = get_interfaces(self) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/init.py", line 249, in get_interfaces 2022-12-29T09:09:12.048Z [container-agent] provides = { 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/init.py", line 253, in 2022-12-29T09:09:12.048Z [container-agent] get_schema(interface["schema"]), 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/utils.py", line 57, in get_schema 2022-12-29T09:09:12.048Z [container-agent] response = _get_schema_response_from_remote(schema) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/serialized_data_interface/utils.py", line 83, in _get_schema_response_from_remote 2022-12-29T09:09:12.048Z [container-agent] response = requests.get(url=url, proxies=proxies) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/api.py", line 76, in get 2022-12-29T09:09:12.048Z [container-agent] return request('get', url, params=params, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/api.py", line 61, in request 2022-12-29T09:09:12.048Z [container-agent] return session.request(method=method, url=url, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/sessions.py", line 542, in request 2022-12-29T09:09:12.048Z [container-agent] resp = self.send(prep, send_kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/sessions.py", line 655, in send 2022-12-29T09:09:12.048Z [container-agent] r = adapter.send(request, kwargs) 2022-12-29T09:09:12.048Z [container-agent] File "/var/lib/juju/agents/unit-istio-pilot-0/charm/venv/requests/adapters.py", line 514, in send 2022-12-29T09:09:12.048Z [container-agent] raise SSLError(e, request=request) 2022-12-29T09:09:12.048Z [container-agent] requests.exceptions.SSLError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /canonical/operator-schemas/master /k8s-service.yaml (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1131)'))) 2022-12-29T09:09:12.289Z [container-agent] 2022-12-29 09:09:12 ERROR juju.worker.uniter.operation runhook.go:140 hook "start" (via hook dispatching script: dispatch) failed: exit status 1 2022-12-29T09:09:12.291Z [container-agent] 2022-12-29 09:09:12 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "start" hook 2022-12-29T09:09:15.862Z [pebble] Check "readiness" failure 67 (threshold 3): received non-20x status code 418 2022-12-29T09:09:17.310Z [container-agent] 2022-12-29 09:09:17 INFO juju.worker.uniter resolver.go:145 awaiting error resolution for "start" hook 2022-12-29T09:09:17.589Z [container-agent] 2022-12-29 09:09:17 INFO juju-log Running legacy hooks/start.

$ juju status Model Controller Cloud/Region Version SLA Timestamp kubeflow microk8s-localhost microk8s/localhost 2.9.37 unsupported 16:16:42+05:30

App Version Status Scale Charm Channel Rev Address Exposed Message admission-webhook res:oci-image@129fe92 active 1 admission-webhook 1.6/stable 60 10.152.183.254 no
argo-controller res:oci-image@669ebd5 error 1 argo-controller 3.3/stable 99 no hook failed: "upgrade-charm" dex-auth waiting 1 dex-auth 2.31/stable 129 10.152.183.201 no installing agent istio-ingressgateway active 1 istio-gateway 1.11/stable 114 10.152.183.137 no
istio-pilot active 1 istio-pilot 1.11/stable 131 10.152.183.33 no
jupyter-controller res:oci-image@e05857e active 1 jupyter-controller 1.6/stable 163 no
jupyter-ui res:oci-image@d55c600 error 1 jupyter-ui 1.6/stable 124 10.152.183.34 no hook failed: "upgrade-charm" kfp-api res:oci-image@bf747d5 active 1 kfp-api 2.0/stable 144 10.152.183.71 no
kfp-db mariadb/server:10.3 active 1 charmed-osm-mariadb-k8s latest/stable 35 10.152.183.41 no ready kfp-persistence res:oci-image@abcf971 active 1 kfp-persistence 2.0/stable 141 no
kfp-profile-controller res:oci-image@b4de878 active 1 kfp-profile-controller 2.0/stable 125 10.152.183.167 no
kfp-schedwf res:oci-image@9c9f710 active 1 kfp-schedwf 2.0/stable 155 no
kfp-ui res:oci-image@47864af error 1 kfp-ui 2.0/stable 144 10.152.183.109 no hook failed: "upgrade-charm" kfp-viewer res:oci-image@94754c0 active 1 kfp-viewer 2.0/stable 152 no
kfp-viz res:oci-image@23ab9b9 error 1 kfp-viz 2.0/stable 134 10.152.183.53 no hook failed: "upgrade-charm" kubeflow-dashboard res:oci-image@6fe6eec active 1 kubeflow-dashboard 1.6/stable 183 10.152.183.227 no
kubeflow-profiles res:profile-image@cfd6935 active 1 kubeflow-profiles 1.6/stable 94 10.152.183.161 no
kubeflow-roles active 1 kubeflow-roles 1.6/stable 49 10.152.183.203 no
kubeflow-volumes res:oci-image@fdb4a5d active 1 kubeflow-volumes 1.6/stable 84 10.152.183.177 no
metacontroller-operator active 1 metacontroller-operator 2.0/stable 48 10.152.183.14 no
minio res:oci-image@1755999 active 1 minio ckf-1.6/stable 99 10.152.183.173 no
oidc-gatekeeper res:oci-image@32de216 active 1 oidc-gatekeeper ckf-1.6/stable 76 10.152.183.216 no
seldon-controller-manager res:oci-image@eb811b6 active 1 seldon-core 1.14/stable 92 10.152.183.80 no
training-operator active 1 training-operator 1.5/stable 65 10.152.183.102 no

Unit Workload Agent Address Ports Message admission-webhook/0 active idle 10.1.105.4 4443/TCP
argo-controller/0
error idle 10.1.105.39 hook failed: "upgrade-charm" dex-auth/0 maintenance idle 10.1.105.40 Configuring dex charm istio-ingressgateway/0 active executing 10.1.105.15 (start) istio-pilot/0 active executing 10.1.105.48 (start) jupyter-controller/0 active idle 10.1.105.11
jupyter-ui/0 error idle 10.1.105.29 5000/TCP hook failed: "upgrade-charm" kfp-api/0 active executing 10.1.105.21 8888/TCP,8887/TCP (upgrade-charm) kfp-db/0 active idle 10.1.105.37 3306/TCP ready kfp-persistence/0 active executing 10.1.105.45 (upgrade-charm) kfp-profile-controller/0 active executing 10.1.105.60 80/TCP (upgrade-charm) kfp-schedwf/0 active idle 10.1.105.63
kfp-ui/0 error idle 10.1.105.17 3000/TCP hook failed: "upgrade-charm" kfp-viewer/0 active idle 10.1.105.22
kfp-viz/0 error idle 10.1.105.31 8888/TCP hook failed: "upgrade-charm" kubeflow-dashboard/0 active executing 10.1.105.43 8082/TCP (upgrade-charm) kubeflow-profiles/0 active executing 10.1.105.36 8080/TCP,8081/TCP (upgrade-charm) kubeflow-roles/0 active idle 10.1.105.1
kubeflow-volumes/0 active executing 10.1.105.20 5000/TCP (upgrade-charm) metacontroller-operator/0 active idle 10.1.105.52
minio/0 active executing 10.1.105.10 9000/TCP,9001/TCP (upgrade-charm) oidc-gatekeeper/1 active executing 10.1.105.25 8080/TCP (upgrade-charm) seldon-controller-manager/0 active idle 10.1.105.35 8080/TCP,4443/TCP
training-operator/0
active idle 10.1.105.42

Please let me know if any other specific details are needed.

Thanks, Govindan

gpotti commented 1 year ago

Looks like this is caused by a bad network/router which was causing breaks in the internet connection in turn causing microk8s to malfuncton. I was able to install and configure kubeflow again in another network, I'm replacing my router, once confirmed I'll go ahead and close this issue.

gpotti commented 1 year ago

Update:

My diagnosis was correct, it was indeed a problem with the internet disconnects. But it was not evident in any of the log messages. Is it possible to add an informational message that istio-ingressgateway and istio-pilot container issues may be related to an unstable internet? This would help because, the unstable internet was not causing issues with any other applications or accessing websites/mails etc. Please let me know if there is something that I can do to help.

i-chvets commented 1 year ago

@gpotti Internet connectivity is more infrastructure issue rather than Kubeflow issue. May be we can add this note to troubleshooting section of one of our blog posts.

i-chvets commented 1 year ago

Added section https://discourse.charmhub.io/t/troubleshooting-kubeflow/4268#heading--other-issues

gpotti commented 1 year ago

Thank you!

gpotti commented 1 year ago

Thank you!