canonical / oidc-gatekeeper-operator

Charmed OIDC Gatekeeper
Apache License 2.0
0 stars 7 forks source link

oidc-gatekeeper is stuck on self-hosted runners at "Waiting for pod startup to complete" #112

Closed orfeas-k closed 7 months ago

orfeas-k commented 9 months ago

During deployment of latest/edge bundle (from bundle.yaml file, not channel) on self-hosted runners, charm is stuck in the following state

  oidc-gatekeeper/0 [idle] maintenance: Waiting for pod startup to complete

At the moment, no additional logs are provided from the CI run and we need to debug more. We should note that deployment on an EC2 VM of the same bundle is successful.

Here's another run that failed also with the same error.

misohu commented 7 months ago

Same thing happen to me today on ec2 instance (t2.2xlarge)

microk8s 1.26/stable juju 3.1

The pods looks good

ubuntu@ip-172-31-95-106:~$ microk8s kubectl get po -n kubeflow | grep oidc
oidc-gatekeeper-0                                2/2     Running   0          22m

Here are the logs

ubuntu@ip-172-31-95-106:~$ microk8s kubectl logs -n kubeflow oidc-gatekeeper-0
Defaulted container "charm" out of: charm, oidc-authservice, charm-init (init)
2023-11-21T09:57:36.314Z [pebble] HTTP API server listening on ":38812".
2023-11-21T09:57:36.314Z [pebble] Started daemon.
2023-11-21T09:57:36.337Z [pebble] POST /v1/services 21.151097ms 202
2023-11-21T09:57:36.337Z [pebble] Started default services with change 1.
2023-11-21T09:57:36.360Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0
2023-11-21T09:57:36.424Z [container-agent] 2023-11-21 09:57:36 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 f6a66aa91eec620f5ac04a19d8c06bef03ae6228 gc go1.20.8]
2023-11-21T09:57:36.424Z [container-agent] starting containeragent unit command
2023-11-21T09:57:36.424Z [container-agent] containeragent unit "unit-oidc-gatekeeper-0" start (3.1.6 [gc])
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.cmd.containeragent.unit runner.go:578 start "unit"
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run.
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301
2023-11-21T09:57:36.450Z [container-agent] 2023-11-21 09:57:36 INFO juju.api apiclient.go:1073 cannot resolve "controller-service.controller-uk8s-controller.svc.cluster.local": lookup controller-service.controller-uk8s-controller.svc.cluster.local: operation was canceled
2023-11-21T09:57:36.450Z [container-agent] 2023-11-21 09:57:36 INFO juju.api apiclient.go:706 connection established to "wss://10.152.183.100:17070/model/496cbece-6e8a-442a-899b-2bbe3efb85dd/api"
2023-11-21T09:57:36.471Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.apicaller connect.go:163 [496cbe] "unit-oidc-gatekeeper-0" successfully connected to "10.152.183.100:17070"
2023-11-21T09:57:36.495Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2023-11-21T09:57:36.495Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.logger logger.go:120 logger worker started
2023-11-21T09:57:36.505Z [container-agent] 2023-11-21 09:57:36 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2023-11-21T09:57:36.506Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.leadership tracker.go:194 oidc-gatekeeper/0 promoted to leadership of oidc-gatekeeper
2023-11-21T09:57:36.542Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2023-11-21T09:57:36.542Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2023-11-21T09:57:36.693Z [container-agent] 2023-11-21 09:57:36 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-oidc-gatekeeper-0
2023-11-21T09:57:36.726Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter uniter.go:363 unit "oidc-gatekeeper/0" started
2023-11-21T09:57:36.730Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-11-21T09:57:36.730Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/focal/oidc-gatekeeper-291 from API server
2023-11-21T09:57:36.739Z [container-agent] 2023-11-21 09:57:36 INFO juju.downloader download.go:109 downloading from ch:amd64/focal/oidc-gatekeeper-291
2023-11-21T09:57:36.949Z [container-agent] 2023-11-21 09:57:36 INFO juju.downloader download.go:92 download complete ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-21T09:57:37.011Z [container-agent] 2023-11-21 09:57:37 INFO juju.downloader download.go:172 download verified ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-21T09:57:46.396Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2023-11-21T09:57:56.321Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418
2023-11-21T09:58:06.333Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418
2023-11-21T09:58:06.333Z [pebble] Check "readiness" failure threshold 3 hit, triggering action
2023-11-21T09:58:16.327Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 418
2023-11-21T09:58:20.091Z [container-agent] 2023-11-21 09:58:20 INFO juju.worker.uniter uniter.go:389 hooks are retried true
2023-11-21T09:58:20.239Z [container-agent] 2023-11-21 09:58:20 INFO juju.worker.uniter resolver.go:165 found queued "install" hook
2023-11-21T09:58:23.092Z [container-agent] 2023-11-21 09:58:23 INFO juju-log Running legacy hooks/install.
2023-11-21T09:58:24.556Z [container-agent] 2023-11-21 09:58:24 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T09:58:24.721Z [container-agent] 2023-11-21 09:58:24 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T09:58:24.847Z [container-agent] 2023-11-21 09:58:24 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T09:58:25.277Z [container-agent] 2023-11-21 09:58:25 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:26.325Z [pebble] Check "readiness" failure 5 (threshold 3): received non-20x status code 418
2023-11-21T09:58:27.347Z [container-agent] 2023-11-21 09:58:27 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:28.767Z [container-agent] 2023-11-21 09:58:28 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:30.035Z [container-agent] 2023-11-21 09:58:30 INFO juju.worker.uniter.operation runhook.go:186 ran "client-secret-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:31.309Z [container-agent] 2023-11-21 09:58:31 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:31.387Z [container-agent] 2023-11-21 09:58:31 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook
2023-11-21T09:58:33.224Z [container-agent] 2023-11-21 09:58:33 ERROR juju-log Failed to handle <LeaderElectedEvent via OIDCGatekeeperOperator/on/leader_elected[26]> with error: Waiting for pod startup to complete
2023-11-21T09:58:33.545Z [container-agent] 2023-11-21 09:58:33 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:35.048Z [container-agent] 2023-11-21 09:58:35 ERROR juju-log Failed to handle <ConfigChangedEvent via OIDCGatekeeperOperator/on/config_changed[31]> with error: Waiting for pod startup to complete
2023-11-21T09:58:35.418Z [container-agent] 2023-11-21 09:58:35 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:35.448Z [container-agent] 2023-11-21 09:58:35 INFO juju.worker.uniter resolver.go:165 found queued "start" hook
2023-11-21T09:58:36.321Z [pebble] Check "readiness" failure 6 (threshold 3): received non-20x status code 418
2023-11-21T09:58:36.372Z [container-agent] 2023-11-21 09:58:36 INFO juju-log Running legacy hooks/start.
2023-11-21T09:58:38.391Z [container-agent] 2023-11-21 09:58:38 ERROR juju-log Failed to handle <StartEvent via OIDCGatekeeperOperator/on/start[36]> with error: Waiting for pod startup to complete
2023-11-21T09:58:38.988Z [container-agent] 2023-11-21 09:58:38 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:41.056Z [container-agent] 2023-11-21 09:58:41 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[41]> with error: Waiting for pod startup to complete
2023-11-21T09:58:41.512Z [container-agent] 2023-11-21 09:58:41 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:43.147Z [container-agent] 2023-11-21 09:58:43 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:45.122Z [container-agent] 2023-11-21 09:58:45 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[51]> with error: Waiting for pod startup to complete
2023-11-21T09:58:45.634Z [container-agent] 2023-11-21 09:58:45 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:47.630Z [container-agent] 2023-11-21 09:58:47 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[56]> with error: Waiting for pod startup to complete
2023-11-21T09:58:48.144Z [container-agent] 2023-11-21 09:58:48 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:51.081Z [container-agent] 2023-11-21 09:58:51 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[61]> with error: Waiting for pod startup to complete
2023-11-21T09:58:52.031Z [container-agent] 2023-11-21 09:58:52 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:54.310Z [container-agent] 2023-11-21 09:58:54 ERROR juju-log client-secret:7: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/client_secret_relation_changed[66]> with error: Waiting for pod startup to complete
2023-11-21T09:58:54.922Z [container-agent] 2023-11-21 09:58:54 INFO juju.worker.uniter.operation runhook.go:186 ran "client-secret-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:57.443Z [container-agent] 2023-11-21 09:58:57 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[71]> with error: Waiting for pod startup to complete
2023-11-21T09:58:57.836Z [container-agent] 2023-11-21 09:58:57 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:59.367Z [container-agent] 2023-11-21 09:58:59 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:01.143Z [container-agent] 2023-11-21 09:59:01 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[81]> with error: Waiting for pod startup to complete
2023-11-21T09:59:01.456Z [container-agent] 2023-11-21 09:59:01 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:02.927Z [container-agent] 2023-11-21 09:59:02 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:04.617Z [container-agent] 2023-11-21 09:59:04 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[91]> with error: Waiting for pod startup to complete
2023-11-21T09:59:04.979Z [container-agent] 2023-11-21 09:59:04 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:10.816Z [container-agent] 2023-11-21 09:59:10 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[96]> with error: Waiting for pod startup to complete
2023-11-21T09:59:11.194Z [container-agent] 2023-11-21 09:59:11 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:30.124Z [container-agent] 2023-11-21 09:59:30 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[101]> with error: Waiting for pod startup to complete
2023-11-21T09:59:30.584Z [container-agent] 2023-11-21 09:59:30 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:35.814Z [container-agent] 2023-11-21 09:59:35 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[106]> with error: Waiting for pod startup to complete
2023-11-21T09:59:36.141Z [container-agent] 2023-11-21 09:59:36 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:00:24.901Z [container-agent] 2023-11-21 10:00:24 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[111]> with error: Waiting for pod startup to complete
2023-11-21T10:00:25.768Z [container-agent] 2023-11-21 10:00:25 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:00:34.895Z [container-agent] 2023-11-21 10:00:34 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[116]> with error: Waiting for pod startup to complete
2023-11-21T10:00:35.201Z [container-agent] 2023-11-21 10:00:35 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:01:05.837Z [container-agent] 2023-11-21 10:01:05 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[121]> with error: Waiting for pod startup to complete
2023-11-21T10:01:06.263Z [container-agent] 2023-11-21 10:01:06 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:03:08.451Z [container-agent] 2023-11-21 10:03:08 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-authservice-pebble-ready" hook (via hook dispatching script: dispatch)
2023-11-21T10:03:47.842Z [container-agent] 2023-11-21 10:03:47 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:03:48.020Z [container-agent] 2023-11-21 10:03:48 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:03:48.056Z [container-agent] 2023-11-21 10:03:48 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T10:03:48.690Z [container-agent] 2023-11-21 10:03:48 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2023-11-21T10:08:31.670Z [container-agent] 2023-11-21 10:08:31 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:08:31.744Z [container-agent] 2023-11-21 10:08:31 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:08:31.775Z [container-agent] 2023-11-21 10:08:31 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T10:08:32.053Z [container-agent] 2023-11-21 10:08:32 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2023-11-21T10:14:21.835Z [container-agent] 2023-11-21 10:14:21 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:14:21.906Z [container-agent] 2023-11-21 10:14:21 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:14:21.935Z [container-agent] 2023-11-21 10:14:21 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T10:14:22.203Z [container-agent] 2023-11-21 10:14:22 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2023-11-21T10:18:22.297Z [container-agent] 2023-11-21 10:18:22 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:18:22.370Z [container-agent] 2023-11-21 10:18:22 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:18:22.400Z [container-agent] 2023-11-21 10:18:22 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T10:18:22.677Z [container-agent] 2023-11-21 10:18:22 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
misohu commented 7 months ago

I checked the logs on charm pod in case oidc was failing

ubuntu@ip-172-31-95-106:~$ microk8s kubectl logs -n kubeflow oidc-gatekeeper-0
Defaulted container "charm" out of: charm, oidc-authservice, charm-init (init)
2023-11-21T09:57:36.314Z [pebble] HTTP API server listening on ":38812".
2023-11-21T09:57:36.314Z [pebble] Started daemon.
2023-11-21T09:57:36.337Z [pebble] POST /v1/services 21.151097ms 202
2023-11-21T09:57:36.337Z [pebble] Started default services with change 1.
2023-11-21T09:57:36.360Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0
2023-11-21T09:57:36.424Z [container-agent] 2023-11-21 09:57:36 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 f6a66aa91eec620f5ac04a19d8c06bef03ae6228 gc go1.20.8]
2023-11-21T09:57:36.424Z [container-agent] starting containeragent unit command
2023-11-21T09:57:36.424Z [container-agent] containeragent unit "unit-oidc-gatekeeper-0" start (3.1.6 [gc])
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.cmd.containeragent.unit runner.go:578 start "unit"
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run.
2023-11-21T09:57:36.445Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301
2023-11-21T09:57:36.450Z [container-agent] 2023-11-21 09:57:36 INFO juju.api apiclient.go:1073 cannot resolve "controller-service.controller-uk8s-controller.svc.cluster.local": lookup controller-service.controller-uk8s-controller.svc.cluster.local: operation was canceled
2023-11-21T09:57:36.450Z [container-agent] 2023-11-21 09:57:36 INFO juju.api apiclient.go:706 connection established to "wss://10.152.183.100:17070/model/496cbece-6e8a-442a-899b-2bbe3efb85dd/api"
2023-11-21T09:57:36.471Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.apicaller connect.go:163 [496cbe] "unit-oidc-gatekeeper-0" successfully connected to "10.152.183.100:17070"
2023-11-21T09:57:36.495Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2023-11-21T09:57:36.495Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.logger logger.go:120 logger worker started
2023-11-21T09:57:36.505Z [container-agent] 2023-11-21 09:57:36 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2023-11-21T09:57:36.506Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.leadership tracker.go:194 oidc-gatekeeper/0 promoted to leadership of oidc-gatekeeper
2023-11-21T09:57:36.542Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2023-11-21T09:57:36.542Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2023-11-21T09:57:36.693Z [container-agent] 2023-11-21 09:57:36 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-oidc-gatekeeper-0
2023-11-21T09:57:36.726Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter uniter.go:363 unit "oidc-gatekeeper/0" started
2023-11-21T09:57:36.730Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-11-21T09:57:36.730Z [container-agent] 2023-11-21 09:57:36 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/focal/oidc-gatekeeper-291 from API server
2023-11-21T09:57:36.739Z [container-agent] 2023-11-21 09:57:36 INFO juju.downloader download.go:109 downloading from ch:amd64/focal/oidc-gatekeeper-291
2023-11-21T09:57:36.949Z [container-agent] 2023-11-21 09:57:36 INFO juju.downloader download.go:92 download complete ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-21T09:57:37.011Z [container-agent] 2023-11-21 09:57:37 INFO juju.downloader download.go:172 download verified ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-21T09:57:46.396Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2023-11-21T09:57:56.321Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418
2023-11-21T09:58:06.333Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418
2023-11-21T09:58:06.333Z [pebble] Check "readiness" failure threshold 3 hit, triggering action
2023-11-21T09:58:16.327Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 418
2023-11-21T09:58:20.091Z [container-agent] 2023-11-21 09:58:20 INFO juju.worker.uniter uniter.go:389 hooks are retried true
2023-11-21T09:58:20.239Z [container-agent] 2023-11-21 09:58:20 INFO juju.worker.uniter resolver.go:165 found queued "install" hook
2023-11-21T09:58:23.092Z [container-agent] 2023-11-21 09:58:23 INFO juju-log Running legacy hooks/install.
2023-11-21T09:58:24.556Z [container-agent] 2023-11-21 09:58:24 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T09:58:24.721Z [container-agent] 2023-11-21 09:58:24 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T09:58:24.847Z [container-agent] 2023-11-21 09:58:24 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-21T09:58:25.277Z [container-agent] 2023-11-21 09:58:25 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:26.325Z [pebble] Check "readiness" failure 5 (threshold 3): received non-20x status code 418
2023-11-21T09:58:27.347Z [container-agent] 2023-11-21 09:58:27 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:28.767Z [container-agent] 2023-11-21 09:58:28 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:30.035Z [container-agent] 2023-11-21 09:58:30 INFO juju.worker.uniter.operation runhook.go:186 ran "client-secret-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:31.309Z [container-agent] 2023-11-21 09:58:31 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-created" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:31.387Z [container-agent] 2023-11-21 09:58:31 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook
2023-11-21T09:58:33.224Z [container-agent] 2023-11-21 09:58:33 ERROR juju-log Failed to handle <LeaderElectedEvent via OIDCGatekeeperOperator/on/leader_elected[26]> with error: Waiting for pod startup to complete
2023-11-21T09:58:33.545Z [container-agent] 2023-11-21 09:58:33 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:35.048Z [container-agent] 2023-11-21 09:58:35 ERROR juju-log Failed to handle <ConfigChangedEvent via OIDCGatekeeperOperator/on/config_changed[31]> with error: Waiting for pod startup to complete
2023-11-21T09:58:35.418Z [container-agent] 2023-11-21 09:58:35 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:35.448Z [container-agent] 2023-11-21 09:58:35 INFO juju.worker.uniter resolver.go:165 found queued "start" hook
2023-11-21T09:58:36.321Z [pebble] Check "readiness" failure 6 (threshold 3): received non-20x status code 418
2023-11-21T09:58:36.372Z [container-agent] 2023-11-21 09:58:36 INFO juju-log Running legacy hooks/start.
2023-11-21T09:58:38.391Z [container-agent] 2023-11-21 09:58:38 ERROR juju-log Failed to handle <StartEvent via OIDCGatekeeperOperator/on/start[36]> with error: Waiting for pod startup to complete
2023-11-21T09:58:38.988Z [container-agent] 2023-11-21 09:58:38 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:41.056Z [container-agent] 2023-11-21 09:58:41 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[41]> with error: Waiting for pod startup to complete
2023-11-21T09:58:41.512Z [container-agent] 2023-11-21 09:58:41 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:43.147Z [container-agent] 2023-11-21 09:58:43 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:45.122Z [container-agent] 2023-11-21 09:58:45 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[51]> with error: Waiting for pod startup to complete
2023-11-21T09:58:45.634Z [container-agent] 2023-11-21 09:58:45 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:47.630Z [container-agent] 2023-11-21 09:58:47 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[56]> with error: Waiting for pod startup to complete
2023-11-21T09:58:48.144Z [container-agent] 2023-11-21 09:58:48 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:51.081Z [container-agent] 2023-11-21 09:58:51 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[61]> with error: Waiting for pod startup to complete
2023-11-21T09:58:52.031Z [container-agent] 2023-11-21 09:58:52 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:54.310Z [container-agent] 2023-11-21 09:58:54 ERROR juju-log client-secret:7: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/client_secret_relation_changed[66]> with error: Waiting for pod startup to complete
2023-11-21T09:58:54.922Z [container-agent] 2023-11-21 09:58:54 INFO juju.worker.uniter.operation runhook.go:186 ran "client-secret-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:57.443Z [container-agent] 2023-11-21 09:58:57 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[71]> with error: Waiting for pod startup to complete
2023-11-21T09:58:57.836Z [container-agent] 2023-11-21 09:58:57 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:58:59.367Z [container-agent] 2023-11-21 09:58:59 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:01.143Z [container-agent] 2023-11-21 09:59:01 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[81]> with error: Waiting for pod startup to complete
2023-11-21T09:59:01.456Z [container-agent] 2023-11-21 09:59:01 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:02.927Z [container-agent] 2023-11-21 09:59:02 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:04.617Z [container-agent] 2023-11-21 09:59:04 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[91]> with error: Waiting for pod startup to complete
2023-11-21T09:59:04.979Z [container-agent] 2023-11-21 09:59:04 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:10.816Z [container-agent] 2023-11-21 09:59:10 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[96]> with error: Waiting for pod startup to complete
2023-11-21T09:59:11.194Z [container-agent] 2023-11-21 09:59:11 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:30.124Z [container-agent] 2023-11-21 09:59:30 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[101]> with error: Waiting for pod startup to complete
2023-11-21T09:59:30.584Z [container-agent] 2023-11-21 09:59:30 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T09:59:35.814Z [container-agent] 2023-11-21 09:59:35 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[106]> with error: Waiting for pod startup to complete
2023-11-21T09:59:36.141Z [container-agent] 2023-11-21 09:59:36 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:00:24.901Z [container-agent] 2023-11-21 10:00:24 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[111]> with error: Waiting for pod startup to complete
2023-11-21T10:00:25.768Z [container-agent] 2023-11-21 10:00:25 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:00:34.895Z [container-agent] 2023-11-21 10:00:34 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[116]> with error: Waiting for pod startup to complete
2023-11-21T10:00:35.201Z [container-agent] 2023-11-21 10:00:35 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:01:05.837Z [container-agent] 2023-11-21 10:01:05 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[121]> with error: Waiting for pod startup to complete
2023-11-21T10:01:06.263Z [container-agent] 2023-11-21 10:01:06 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-21T10:03:08.451Z [container-agent] 2023-11-21 10:03:08 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-authservice-pebble-ready" hook (via hook dispatching script: dispatch)
2023-11-21T10:03:47.842Z [container-agent] 2023-11-21 10:03:47 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-21T10:03:48.020Z [container-agent] 2023-11-21 10:03:48 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"

Compared to logs when charm is working (don't mind the public-urls errors those are by design)

ubuntu@ip-172-31-93-171:~$ microk8s kubectl -n kubeflow logs -f oidc-gatekeeper-0
Defaulted container "charm" out of: charm, oidc-authservice, charm-init (init)
2023-11-22T08:44:38.449Z [pebble] HTTP API server listening on ":38812".
2023-11-22T08:44:38.450Z [pebble] Started daemon.
2023-11-22T08:44:38.469Z [pebble] POST /v1/services 17.972606ms 202
2023-11-22T08:44:38.469Z [pebble] Started default services with change 1.
2023-11-22T08:44:38.481Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0
2023-11-22T08:44:38.517Z [container-agent] 2023-11-22 08:44:38 INFO juju.cmd supercommand.go:56 running containerAgent [3.1.6 f6a66aa91eec620f5ac04a19d8c06bef03ae6228 gc go1.20.8]
2023-11-22T08:44:38.517Z [container-agent] starting containeragent unit command
2023-11-22T08:44:38.518Z [container-agent] containeragent unit "unit-oidc-gatekeeper-0" start (3.1.6 [gc])
2023-11-22T08:44:38.519Z [container-agent] 2023-11-22 08:44:38 INFO juju.cmd.containeragent.unit runner.go:578 start "unit"
2023-11-22T08:44:38.519Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 3.1.6 have already been run.
2023-11-22T08:44:38.522Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.probehttpserver server.go:157 starting http server on 127.0.0.1:65301
2023-11-22T08:44:38.545Z [container-agent] 2023-11-22 08:44:38 INFO juju.api apiclient.go:1073 cannot resolve "controller-service.controller-uk8sx.svc.cluster.local": lookup controller-service.controller-uk8sx.svc.cluster.local: operation was canceled
2023-11-22T08:44:38.546Z [container-agent] 2023-11-22 08:44:38 INFO juju.api apiclient.go:706 connection established to "wss://10.152.183.247:17070/model/f15252c4-8e76-4d3f-85bd-7136acc26a4b/api"
2023-11-22T08:44:38.551Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.apicaller connect.go:163 [f15252] "unit-oidc-gatekeeper-0" successfully connected to "10.152.183.247:17070"
2023-11-22T08:44:38.576Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2023-11-22T08:44:38.576Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.logger logger.go:120 logger worker started
2023-11-22T08:44:38.585Z [container-agent] 2023-11-22 08:44:38 WARNING juju.worker.proxyupdater proxyupdater.go:241 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2023-11-22T08:44:38.611Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.leadership tracker.go:194 oidc-gatekeeper/0 promoted to leadership of oidc-gatekeeper
2023-11-22T08:44:38.641Z [container-agent] 2023-11-22 08:44:38 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-oidc-gatekeeper-0
2023-11-22T08:44:38.647Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2023-11-22T08:44:38.647Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2023-11-22T08:44:38.665Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.uniter uniter.go:363 unit "oidc-gatekeeper/0" started
2023-11-22T08:44:38.678Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.uniter uniter.go:689 resuming charm install
2023-11-22T08:44:38.680Z [container-agent] 2023-11-22 08:44:38 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/focal/oidc-gatekeeper-291 from API server
2023-11-22T08:44:38.682Z [container-agent] 2023-11-22 08:44:38 INFO juju.downloader download.go:109 downloading from ch:amd64/focal/oidc-gatekeeper-291
2023-11-22T08:44:38.978Z [container-agent] 2023-11-22 08:44:38 INFO juju.downloader download.go:92 download complete ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-22T08:44:39.028Z [container-agent] 2023-11-22 08:44:39 INFO juju.downloader download.go:172 download verified ("ch:amd64/focal/oidc-gatekeeper-291")
2023-11-22T08:44:48.465Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2023-11-22T08:44:58.457Z [pebble] Check "readiness" failure 2 (threshold 3): received non-20x status code 418
2023-11-22T08:45:08.458Z [pebble] Check "readiness" failure 3 (threshold 3): received non-20x status code 418
2023-11-22T08:45:08.458Z [pebble] Check "readiness" failure threshold 3 hit, triggering action
2023-11-22T08:45:18.457Z [pebble] Check "readiness" failure 4 (threshold 3): received non-20x status code 418
2023-11-22T08:45:28.462Z [pebble] Check "readiness" failure 5 (threshold 3): received non-20x status code 418
2023-11-22T08:45:29.998Z [container-agent] 2023-11-22 08:45:29 INFO juju.worker.uniter uniter.go:389 hooks are retried true
2023-11-22T08:45:30.111Z [container-agent] 2023-11-22 08:45:30 INFO juju.worker.uniter resolver.go:165 found queued "install" hook
2023-11-22T08:45:32.593Z [container-agent] 2023-11-22 08:45:32 INFO juju-log Running legacy hooks/install.
2023-11-22T08:45:33.894Z [container-agent] 2023-11-22 08:45:33 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-22T08:45:33.996Z [container-agent] 2023-11-22 08:45:33 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-22T08:45:34.056Z [container-agent] 2023-11-22 08:45:34 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-22T08:45:34.440Z [container-agent] 2023-11-22 08:45:34 INFO juju.worker.uniter.operation runhook.go:186 ran "install" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:36.166Z [container-agent] 2023-11-22 08:45:36 INFO juju.worker.uniter.operation runhook.go:186 ran "client-secret-relation-created" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:37.431Z [container-agent] 2023-11-22 08:45:37 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-created" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:38.457Z [pebble] Check "readiness" failure 6 (threshold 3): received non-20x status code 418
2023-11-22T08:45:38.503Z [container-agent] 2023-11-22 08:45:38 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-created" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:39.801Z [container-agent] 2023-11-22 08:45:39 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-created" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:39.888Z [container-agent] 2023-11-22 08:45:39 INFO juju.worker.uniter resolver.go:165 found queued "leader-elected" hook
2023-11-22T08:45:40.659Z [container-agent] 2023-11-22 08:45:40 ERROR juju-log Failed to handle <LeaderElectedEvent via OIDCGatekeeperOperator/on/leader_elected[26]> with error: public-url config required
2023-11-22T08:45:41.060Z [container-agent] 2023-11-22 08:45:41 INFO juju.worker.uniter.operation runhook.go:186 ran "leader-elected" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:41.892Z [container-agent] 2023-11-22 08:45:41 ERROR juju-log Failed to handle <ConfigChangedEvent via OIDCGatekeeperOperator/on/config_changed[31]> with error: public-url config required
2023-11-22T08:45:42.520Z [container-agent] 2023-11-22 08:45:42 INFO juju.worker.uniter.operation runhook.go:186 ran "config-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:42.583Z [container-agent] 2023-11-22 08:45:42 INFO juju.worker.uniter resolver.go:165 found queued "start" hook
2023-11-22T08:45:43.996Z [container-agent] 2023-11-22 08:45:43 INFO juju-log Running legacy hooks/start.
2023-11-22T08:45:45.423Z [container-agent] 2023-11-22 08:45:45 ERROR juju-log Failed to handle <StartEvent via OIDCGatekeeperOperator/on/start[36]> with error: public-url config required
2023-11-22T08:45:45.909Z [container-agent] 2023-11-22 08:45:45 INFO juju.worker.uniter.operation runhook.go:186 ran "start" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:47.543Z [container-agent] 2023-11-22 08:45:47 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[41]> with error: public-url config required
2023-11-22T08:45:48.369Z [container-agent] 2023-11-22 08:45:48 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:50.341Z [container-agent] 2023-11-22 08:45:50 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[46]> with error: public-url config required
2023-11-22T08:45:50.959Z [container-agent] 2023-11-22 08:45:50 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:54.038Z [container-agent] 2023-11-22 08:45:54 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[51]> with error: public-url config required
2023-11-22T08:45:54.516Z [container-agent] 2023-11-22 08:45:54 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:56.385Z [container-agent] 2023-11-22 08:45:56 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:58.101Z [container-agent] 2023-11-22 08:45:58 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-22T08:45:59.158Z [container-agent] 2023-11-22 08:45:59 ERROR juju-log oidc-client:9: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/oidc_client_relation_changed[66]> with error: public-url config required
2023-11-22T08:45:59.508Z [container-agent] 2023-11-22 08:45:59 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-client-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:46:01.000Z [container-agent] 2023-11-22 08:46:01 ERROR juju-log ingress-auth:18: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_auth_relation_changed[71]> with error: public-url config required
2023-11-22T08:46:01.464Z [container-agent] 2023-11-22 08:46:01 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-auth-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:46:02.744Z [container-agent] 2023-11-22 08:46:02 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[76]> with error: public-url config required
2023-11-22T08:46:03.549Z [container-agent] 2023-11-22 08:46:03 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:46:05.520Z [container-agent] 2023-11-22 08:46:05 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-joined" hook (via hook dispatching script: dispatch)
2023-11-22T08:46:06.625Z [container-agent] 2023-11-22 08:46:06 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[86]> with error: public-url config required
2023-11-22T08:46:06.972Z [container-agent] 2023-11-22 08:46:06 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:46:10.734Z [container-agent] 2023-11-22 08:46:10 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[91]> with error: public-url config required
2023-11-22T08:46:11.381Z [container-agent] 2023-11-22 08:46:11 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:47:35.680Z [container-agent] 2023-11-22 08:47:35 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[96]> with error: public-url config required
2023-11-22T08:47:36.070Z [container-agent] 2023-11-22 08:47:36 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:48:26.880Z [container-agent] 2023-11-22 08:48:26 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[101]> with error: public-url config required
2023-11-22T08:48:27.207Z [container-agent] 2023-11-22 08:48:27 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:48:46.559Z [container-agent] 2023-11-22 08:48:46 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[106]> with error: public-url config required
2023-11-22T08:48:47.110Z [container-agent] 2023-11-22 08:48:47 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:49:56.044Z [container-agent] 2023-11-22 08:49:56 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[111]> with error: public-url config required
2023-11-22T08:49:56.368Z [container-agent] 2023-11-22 08:49:56 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:49:58.420Z [container-agent] 2023-11-22 08:49:58 INFO juju.worker.uniter.operation runhook.go:186 ran "oidc-authservice-pebble-ready" hook (via hook dispatching script: dispatch)
2023-11-22T08:50:23.213Z [container-agent] 2023-11-22 08:50:23 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-22T08:50:23.282Z [container-agent] 2023-11-22 08:50:23 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"
2023-11-22T08:50:23.315Z [container-agent] 2023-11-22 08:50:23 INFO juju-log Kubernetes service 'oidc-gatekeeper' patched successfully
2023-11-22T08:50:23.591Z [container-agent] 2023-11-22 08:50:23 INFO juju.worker.uniter.operation runhook.go:186 ran "update-status" hook (via hook dispatching script: dispatch)
2023-11-22T08:50:36.267Z [container-agent] 2023-11-22 08:50:36 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[126]> with error: public-url config required
2023-11-22T08:50:36.671Z [container-agent] 2023-11-22 08:50:36 INFO juju.worker.uniter.operation runhook.go:186 ran "ingress-relation-changed" hook (via hook dispatching script: dispatch)
2023-11-22T08:55:33.488Z [container-agent] 2023-11-22 08:55:33 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/kubeflow/services/oidc-gatekeeper "HTTP/1.1 200 OK"

The difference is that in case of failed charm I noticed the error messages

2023-11-21T09:59:04.617Z [container-agent] 2023-11-21 09:59:04 ERROR juju-log ingress:16: Failed to handle <RelationChangedEvent via OIDCGatekeeperOperator/on/ingress_relation_changed[91]> with error: Waiting for pod startup to complete

This is weird as the pod is active without problems

ubuntu@ip-172-31-95-106:~$ microk8s kubectl get po -n kubeflow | grep oidc
oidc-gatekeeper-0 
misohu commented 7 months ago

Next I compared the workload logs in case charm fails is just

ubuntu@ip-172-31-93-171:~$ microk8s kubectl -n kubeflow logs -f oidc-gatekeeper-0 -c oidc-authservice
2023-11-21T09:57:36.314Z [pebble] HTTP API server listening on ":38812".
2023-11-21T09:57:36.314Z [pebble] Started daemon.

Compared to working charm

ubuntu@ip-172-31-93-171:~$ microk8s kubectl -n kubeflow logs -f oidc-gatekeeper-0 -c oidc-authservice
2023-11-22T08:49:57.414Z [pebble] HTTP API server listening on ":38813".
2023-11-22T08:49:57.414Z [pebble] Started daemon.
2023-11-22T08:58:36.115Z [pebble] GET /v1/plan?format=yaml 291.129µs 200
2023-11-22T08:58:36.117Z [pebble] POST /v1/layers 435.449µs 200
2023-11-22T08:58:36.134Z [pebble] POST /v1/services 10.400799ms 202
2023-11-22T08:58:36.148Z [pebble] Service "oidc-authservice" starting: /home/authservice/oidc-authservice
2023-11-22T08:58:36.181Z [oidc-authservice] time="2023-11-22T08:58:36Z" level=info msg="Config: &{ProviderURL:http://10.64.140.43.nip.io/dex ClientID:authservice-oidc ClientSecret:TEFXEUI1G44HCML20EQN362B297G0G OIDCAuthURL: RedirectURL:/authservice/oidc/callback OIDCScopes:[openid profile email groups] StrictSessionValidation:false OIDCStateStorePath:oidc_state.db AuthserviceURLPrefix:/authservice/ SkipAuthURLs:[/authservice/ /dex/] AuthHeader:Authorization Audiences:[istio-ingressgateway.istio-system.svc.cluster.local] HomepageURL:/authservice/site/homepage AfterLoginURL: AfterLogoutURL:http://10.64.140.43.nip.io UserIDHeader:kubeflow-userid GroupsHeader:kubeflow-groups UserIDPrefix: UserIDTransformer:{rules:[]} UserIDClaim:email UserIDTokenHeader: GroupsClaim:groups IDTokenHeader:Authorization Hostname: Port:8080 WebServerPort:8082 ReadinessProbePort:8081 CABundlePath: SessionStorePath:bolt.db SessionMaxAge:86400 SessionSameSite:Lax ClientName:AuthService ThemesURL:themes Theme:kubeflow TemplatePath:[web/templates/default] UserTemplateContext:map[] GroupsAllowlist:[*]}"
2023-11-22T08:58:36.181Z [oidc-authservice] time="2023-11-22T08:58:36Z" level=info msg="Starting readiness probe at 8081"
2023-11-22T08:58:36.181Z [oidc-authservice] time="2023-11-22T08:58:36Z" level=info msg="Starting server at :8080"
2023-11-22T08:58:36.181Z [oidc-authservice] time="2023-11-22T08:58:36Z" level=info msg="Starting web server at :8082"
2023-11-22T08:58:36.294Z [oidc-authservice] time="2023-11-22T08:58:36Z" level=error msg="OIDC provider setup failed, retrying in 10 seconds: 404 Not Found: "
2023-11-22T08:58:37.159Z [pebble] GET /v1/changes/1/wait?timeout=4.000s 1.023678783s 200
2023-11-22T08:58:38.194Z [pebble] GET /v1/plan?format=yaml 362.277µs 200
2023-11-22T08:58:38.196Z [pebble] POST /v1/layers 340.641µs 200
2023-11-22T08:58:38.211Z [pebble] POST /v1/services 7.511069ms 202
2023-11-22T08:58:38.235Z [pebble] GET /v1/changes/2/wait?timeout=4.000s 23.098367ms 200
misohu commented 7 months ago

The Dashboard cannot be accessed.

image

403 - looks like the request cannot be authenticated (should be done by oidc)

No special logs in ingress gateway

ubuntu@ip-172-31-94-31:~$ microk8s kubectl -n kubeflow logs -f  istio-ingressgateway-workload-5b7b684675-8b2ql

2023-11-22T09:44:16.373699Z info    JWT policy is third-party-jwt
2023-11-22T09:44:16.373704Z info    using credential fetcher of JWT type in cluster.local trust domain
2023-11-22T09:44:16.376117Z info    Workload SDS socket not found. Starting Istio SDS Server
2023-11-22T09:44:16.376138Z info    CA Endpoint istiod.kubeflow.svc:15012, provider Citadel
2023-11-22T09:44:16.376121Z info    Opening status port 15020
2023-11-22T09:44:16.376162Z info    Using CA istiod.kubeflow.svc:15012 cert with certs: var/run/secrets/istio/root-cert.pem
2023-11-22T09:44:16.396293Z info    ads All caches have been synced up in 32.461028ms, marking server ready
2023-11-22T09:44:16.416311Z info    xdsproxy    Initializing with upstream address "istiod.kubeflow.svc:15012" and cluster "Kubernetes"
2023-11-22T09:44:16.416333Z info    sds Starting SDS grpc server
2023-11-22T09:44:16.416962Z info    starting Http service at 127.0.0.1:15004
2023-11-22T09:44:16.418529Z info    Pilot SAN: [istiod.kubeflow.svc]
2023-11-22T09:44:16.420190Z info    Starting proxy agent
2023-11-22T09:44:16.420225Z info    starting
2023-11-22T09:44:16.420245Z info    Envoy command: [-c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields --log-format %Y-%m-%dT%T.%fZ   %l  envoy %n %g:%#  %v  thread=%t -l warning --component-log-level misc:error]
2023-11-22T09:44:16.577037Z info    xdsproxy    connected to upstream XDS server: istiod.kubeflow.svc:15012
2023-11-22T09:44:16.614815Z info    cache   generated new workload certificate  latency=217.516583ms ttl=23h59m59.385197559s
2023-11-22T09:44:16.614880Z info    cache   Root cert has changed, start rotating root cert
2023-11-22T09:44:16.614898Z info    ads XDS: Incremental Pushing:0 ConnectedEndpoints:0 Version:
2023-11-22T09:44:16.614957Z info    cache   returned workload trust anchor from cache   ttl=23h59m59.385048876s
2023-11-22T09:44:16.640671Z info    ads ADS: new connection for node:istio-ingressgateway-workload-5b7b684675-8b2ql.kubeflow-1
2023-11-22T09:44:16.640835Z info    cache   returned workload certificate from cache    ttl=23h59m59.359172136s
2023-11-22T09:44:16.640873Z info    ads ADS: new connection for node:istio-ingressgateway-workload-5b7b684675-8b2ql.kubeflow-2
2023-11-22T09:44:16.640955Z info    cache   returned workload trust anchor from cache   ttl=23h59m59.359051883s
2023-11-22T09:44:16.641285Z info    ads SDS: PUSH request for node:istio-ingressgateway-workload-5b7b684675-8b2ql.kubeflow resources:1 size:4.0kB resource:default
2023-11-22T09:44:16.642130Z info    ads SDS: PUSH request for node:istio-ingressgateway-workload-5b7b684675-8b2ql.kubeflow resources:1 size:1.1kB resource:ROOTCA
2023-11-22T09:44:17.263854Z info    Readiness succeeded in 908.840996ms
2023-11-22T09:44:17.264231Z info    Envoy proxy is ready

Looks like the request is not hgitting oidc (oidc no change in logs):

ubuntu@ip-172-31-93-171:~$ microk8s kubectl -n kubeflow logs -f oidc-gatekeeper-0 -c oidc-authservice
2023-11-21T09:57:36.314Z [pebble] HTTP API server listening on ":38812".
2023-11-21T09:57:36.314Z [pebble] Started daemon.
misohu commented 7 months ago

I checked virtual services (no problems)

ubuntu@ip-172-31-93-171:~$ microk8s kubectl get virtualservice --all-namespaces
NAMESPACE   NAME                   GATEWAYS                        HOSTS   AGE
kubeflow    dex-auth               ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    jupyter-ui             ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    katib-ui               ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    kfp-ui                 ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    kubeflow-dashboard     ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    kubeflow-volumes       ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    oidc-gatekeeper        ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    envoy                  ["kubeflow/kubeflow-gateway"]   ["*"]   51m
kubeflow    tensorboards-web-app   ["kubeflow/kubeflow-gateway"]   ["*"]   51m

Envoy filters (no problem the istio-pilot-authn-filter with expected content)

NAMESPACE   NAME                       AGE
kubeflow    stats-filter-1.13          68m
kubeflow    stats-filter-1.14          68m
kubeflow    stats-filter-1.15          68m
kubeflow    stats-filter-1.16          68m
kubeflow    stats-filter-1.17          68m
kubeflow    tcp-stats-filter-1.13      68m
kubeflow    tcp-stats-filter-1.14      68m
kubeflow    tcp-stats-filter-1.15      68m
kubeflow    tcp-stats-filter-1.16      68m
kubeflow    tcp-stats-filter-1.17      68m
kubeflow    istio-pilot-authn-filter   52m
misohu commented 7 months ago

The problem can be resolved by unsetting and setting back the public-url

ubuntu@ip-172-31-94-31:~$ juju config oidc-gatekeeper public-url=""
ubuntu@ip-172-31-94-31:~$ juju config oidc-gatekeeper public-url=http://10.64.140.43.nip.io
misohu commented 7 months ago

The problem is that the pebble ready event is never handled. Based of the above logs from charm pod we can see that pebble ready is the last hook to be handled. Although all the previous hooks are handled by the same main function only the pebble ready event has the pebble actually ready so only that event can update the layer. Because of this the charm gets occasionally stuck in Waiting state. Fixed in https://github.com/canonical/oidc-gatekeeper-operator/pull/118