Closed gustavosr98 closed 1 year ago
Seems a workaround for it to remove the application, re-deploy and re-add relations
juju remove-application resource-dispatcher
juju deploy resource-dispatcher --channel edge --trust
juju relate mlflow-server:secrets resource-dispatcher:secrets
juju relate mlflow-server:pod-defaults resource-dispatcher:pod-defaults
@gustavosr98 I could not reproduce your issue, but I got similar results.
After deploying resource-dispatcher
:
juju deploy resource-dispatcher --channel edge --trust
I waited until it went into state you describe Container is not ready
, only after that I added the required mlflow-server:*
relations and resource-dispatcher
came up successfully.
This is not expected behaviour.
Could you please confirm same behaviour? I.e. successful deployment of resource-dispatcher
if relations are added after it is goes into Container is not ready
state.
It is very tricky to test, because once resource-dispatcher
successfully deployed once, it always comes up afterwards, regardless when relations are added.
Just re-run tests exactly as originally outlined by @gustavosr98 and could not reproduce the issue.
Logs for resource-dispatcher
in Container is not ready
:
$ microk8s.kubectl -n test1 logs resource-dispatcher-0
Defaulted container "charm" out of: charm, resource-dispatcher, charm-init (init)
2023-08-16T21:37:09.708Z [pebble] HTTP API server listening on ":38812".
2023-08-16T21:37:09.708Z [pebble] Started daemon.
2023-08-16T21:37:09.714Z [pebble] POST /v1/services 5.706092ms 202
2023-08-16T21:37:09.715Z [pebble] Started default services with change 1.
2023-08-16T21:37:09.721Z [pebble] Service "container-agent" starting: /charm/bin/containeragent unit --data-dir /var/lib/juju --append-env "PATH=$PATH:/charm/bin" --show-log --charm-modified-version 0
2023-08-16T21:37:09.761Z [container-agent] 2023-08-16 21:37:09 INFO juju.cmd supercommand.go:56 running containerAgent [2.9.44 02d498631e196f2a37f9b7c3b5c31bdcb1dad333 gc go1.20.5]
2023-08-16T21:37:09.761Z [container-agent] starting containeragent unit command
2023-08-16T21:37:09.761Z [container-agent] containeragent unit "unit-resource-dispatcher-0" start (2.9.44 [gc])
2023-08-16T21:37:09.761Z [container-agent] 2023-08-16 21:37:09 INFO juju.cmd.containeragent.unit runner.go:556 start "unit"
2023-08-16T21:37:09.761Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.upgradesteps worker.go:60 upgrade steps for 2.9.44 have already been run.
2023-08-16T21:37:09.762Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.probehttpserver server.go:157 starting http server on [::]:65301
2023-08-16T21:37:09.787Z [container-agent] 2023-08-16 21:37:09 INFO juju.api apiclient.go:1054 cannot resolve "controller-service.controller-uk8s.svc.cluster.local": lookup controller-service.controller-uk8s.svc.cluster.local: operation was canceled
2023-08-16T21:37:09.787Z [container-agent] 2023-08-16 21:37:09 INFO juju.api apiclient.go:687 connection established to "wss://10.152.183.156:17070/model/5027f823-e089-4fc5-87c4-0dd3e6361a92/api"
2023-08-16T21:37:09.792Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.apicaller connect.go:163 [5027f8] "unit-resource-dispatcher-0" successfully connected to "10.152.183.156:17070"
2023-08-16T21:37:09.815Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.migrationminion worker.go:142 migration phase is now: NONE
2023-08-16T21:37:09.816Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.logger logger.go:120 logger worker started
2023-08-16T21:37:09.821Z [container-agent] 2023-08-16 21:37:09 WARNING juju.worker.proxyupdater proxyupdater.go:282 unable to set snap core settings [proxy.http= proxy.https= proxy.store=]: exec: "snap": executable file not found in $PATH, output: ""
2023-08-16T21:37:09.847Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.leadership tracker.go:194 resource-dispatcher/0 promoted to leadership of resource-dispatcher
2023-08-16T21:37:09.850Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.caasupgrader upgrader.go:113 abort check blocked until version event received
2023-08-16T21:37:09.850Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.caasupgrader upgrader.go:119 unblocking abort check
2023-08-16T21:37:09.866Z [container-agent] 2023-08-16 21:37:09 INFO juju.agent.tools symlinks.go:20 ensure jujuc symlinks in /var/lib/juju/tools/unit-resource-dispatcher-0
2023-08-16T21:37:09.885Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.uniter uniter.go:326 unit "resource-dispatcher/0" started
2023-08-16T21:37:09.890Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.uniter uniter.go:642 resuming charm install
2023-08-16T21:37:09.893Z [container-agent] 2023-08-16 21:37:09 INFO juju.worker.uniter.charm bundles.go:81 downloading ch:amd64/focal/resource-dispatcher-78 from API server
2023-08-16T21:37:09.893Z [container-agent] 2023-08-16 21:37:09 INFO juju.downloader download.go:110 downloading from ch:amd64/focal/resource-dispatcher-78
2023-08-16T21:37:10.011Z [container-agent] 2023-08-16 21:37:10 INFO juju.downloader download.go:93 download complete ("ch:amd64/focal/resource-dispatcher-78")
2023-08-16T21:37:10.054Z [container-agent] 2023-08-16 21:37:10 INFO juju.downloader download.go:173 download verified ("ch:amd64/focal/resource-dispatcher-78")
2023-08-16T21:37:18.400Z [container-agent] 2023-08-16 21:37:18 INFO juju.worker.uniter uniter.go:352 hooks are retried true
2023-08-16T21:37:18.480Z [container-agent] 2023-08-16 21:37:18 INFO juju.worker.uniter resolver.go:159 found queued "install" hook
2023-08-16T21:37:19.712Z [pebble] Check "readiness" failure 1 (threshold 3): received non-20x status code 418
2023-08-16T21:37:19.753Z [container-agent] 2023-08-16 21:37:19 INFO juju-log Running legacy hooks/install.
2023-08-16T21:37:20.734Z [container-agent] 2023-08-16 21:37:20 INFO juju-log HTTP Request: GET https://10.152.183.1/apis/apiextensions.k8s.io/v1/customresourcedefinitions "HTTP/1.1 200 OK"
2023-08-16T21:37:20.792Z [container-agent] 2023-08-16 21:37:20 INFO juju-log Rendering manifests
2023-08-16T21:37:20.904Z [container-agent] 2023-08-16 21:37:20 INFO juju-log HTTP Request: PATCH https://10.152.183.1/apis/metacontroller.k8s.io/v1alpha1/decoratorcontrollers/kubeflow-resource-dispatcher-controller?fieldManager=lightkube "HTTP/1.1 201 Created"
2023-08-16T21:37:20.951Z [container-agent] 2023-08-16 21:37:20 INFO juju-log Reconcile completed successfully
2023-08-16T21:37:21.112Z [container-agent] 2023-08-16 21:37:21 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/test1/services/resource-dispatcher "HTTP/1.1 200 OK"
2023-08-16T21:37:21.219Z [container-agent] 2023-08-16 21:37:21 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/test1/services/resource-dispatcher "HTTP/1.1 200 OK"
2023-08-16T21:37:21.269Z [container-agent] 2023-08-16 21:37:21 INFO juju-log Kubernetes service 'resource-dispatcher' patched successfully
2023-08-16T21:37:21.589Z [container-agent] 2023-08-16 21:37:21 INFO juju.worker.uniter.operation runhook.go:159 ran "install" hook (via hook dispatching script: dispatch)
2023-08-16T21:37:21.658Z [container-agent] 2023-08-16 21:37:21 INFO juju.worker.uniter resolver.go:159 found queued "leader-elected" hook
2023-08-16T21:37:22.619Z [container-agent] 2023-08-16 21:37:22 INFO juju.worker.uniter.operation runhook.go:159 ran "leader-elected" hook (via hook dispatching script: dispatch)
2023-08-16T21:37:23.423Z [container-agent] 2023-08-16 21:37:23 INFO juju-log Event <ConfigChangedEvent via ResourceDispatcherOperator/on/config_changed[7]> stopped early with message: Container is not ready
2023-08-16T21:37:23.555Z [container-agent] 2023-08-16 21:37:23 INFO juju-log HTTP Request: GET https://10.152.183.1/api/v1/namespaces/test1/services/resource-dispatcher "HTTP/1.1 200 OK"
2023-08-16T21:37:23.663Z [container-agent] 2023-08-16 21:37:23 INFO juju-log HTTP Request: PATCH https://10.152.183.1/api/v1/namespaces/test1/services/resource-dispatcher "HTTP/1.1 200 OK"
2023-08-16T21:37:23.708Z [container-agent] 2023-08-16 21:37:23 INFO juju-log Kubernetes service 'resource-dispatcher' patched successfully
2023-08-16T21:37:24.009Z [container-agent] 2023-08-16 21:37:24 INFO juju.worker.uniter.operation runhook.go:159 ran "config-changed" hook (via hook dispatching script: dispatch)
2023-08-16T21:37:24.048Z [container-agent] 2023-08-16 21:37:24 INFO juju.worker.uniter resolver.go:159 found queued "start" hook
2023-08-16T21:37:24.691Z [container-agent] 2023-08-16 21:37:24 INFO juju-log Running legacy hooks/start.
2023-08-16T21:37:25.699Z [container-agent] 2023-08-16 21:37:25 INFO juju.worker.uniter.operation runhook.go:159 ran "start" hook (via hook dispatching script: dispatch)
2023-08-16T21:37:37.578Z [container-agent] 2023-08-16 21:37:37 INFO juju.worker.uniter.operation runhook.go:159 ran "resource-dispatcher-pebble-ready" hook (via hook dispatching script: dispatch)
Looks like service in resource-dispatcher container is not started when initially deployed. Or started and failed. MInimal steps to reproduce (on clean microk8s cluser and Juju controller):
juju add-model test1
juju deploy metacontroller-operator --channel 2.0/stable --trust
Wait till metacontroller-operator
is installed and in idle/active state and deploy resource-dispatcher
:
juju deploy resource-dispatcher --channel edge --trust
The above should result in the following state:
Model Controller Cloud/Region Version SLA Timestamp
test1 uk8s microk8s/localhost 2.9.44 unsupported 21:38:12Z
App Version Status Scale Charm Channel Rev Address Exposed Message
metacontroller-operator active 1 metacontroller-operator 2.0/stable 117 10.152.183.84 no
resource-dispatcher waiting 1 resource-dispatcher edge 78 10.152.183.121 no installing agent
Unit Workload Agent Address Ports Message
metacontroller-operator/0* active idle 10.1.45.202
resource-dispatcher/0* waiting idle 10.1.45.204 Container is not ready
Command python3 main.py
for container based on resource-dispatcher-image
is not running:
$ microk8s.kubectl -n test1 exec -c resource-dispatcher resource-dispatcher-0 -- ps -ax
PID TTY STAT TIME COMMAND
1 ? Ssl 0:00 /charm/bin/pebble run --create-dirs --hold --http :38813 --verbose
16 ? Rs 0:00 ps -ax
When properly started, command should be running:
$ microk8s.kubectl -n test1 exec -c resource-dispatcher resource-dispatcher-0 -- ps -ax
PID TTY STAT TIME COMMAND
1 ? Ssl 0:00 /charm/bin/pebble run --create-dirs --hold --http :38813 --verbose
16 ? S 0:00 python3 main.py --port 80 --label user.kubeflow.org/enabled
17 ? Rs 0:00 ps -ax
Candidate for fix in 1.8
Versions
Microk8s: snap 1.24/stable (v1.24.16) Juju: Controller 2.9.43 / CLI 2.9.44-ubuntu-amd64 resource-dispatcher: charm edge rev 78
Reproduce
Following https://discourse.charmhub.io/t/get-started-with-charmed-mlflow-v2-and-charmed-kubeflow/10782#heading--deploy-mlflow
Logs
Juju
K8s