coreos / container-linux-update-operator

A Kubernetes operator to manage updates of Container Linux by CoreOS
Apache License 2.0
209 stars 49 forks source link

Nodes never receive ok-to-reboot #185

Closed Calpicow closed 6 years ago

Calpicow commented 6 years ago

Running v0.7.0 with Kubernetes v1.7.16. Containers are in kube-system instead of reboot-coordinator due to internal security policy. Logs below, should be self-explanatory:

update-operator

I0912 22:49:20.767914       7 main.go:82] /bin/update-operator running
I0912 22:49:20.768335       7 leaderelection.go:174] attempting to acquire leader lease...
I0912 22:51:24.598115       7 leaderelection.go:184] successfully acquired lease kube-system/container-linux-update-operator-lock
I0912 22:52:01.662455       7 operator.go:537] Found 0 rebooted nodes
I0912 22:52:02.863635       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:52:02.863671       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion
I0912 22:53:10.267939       7 operator.go:537] Found 0 rebooted nodes
I0912 22:53:11.267125       7 operator.go:504] Found 1 nodes that need a reboot
I0912 22:53:11.746733       7 operator.go:511] Waiting for before-reboot annotations on node "<NODE>": [container-linux-update.v1.coreos.com/before-reboot-ready]
I0912 22:54:17.967210       7 operator.go:537] Found 0 rebooted nodes
I0912 22:54:18.968796       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:54:18.968832       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion
I0912 22:55:26.566616       7 operator.go:537] Found 0 rebooted nodes
I0912 22:55:27.464321       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:55:27.464353       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion
I0912 22:56:34.866901       7 operator.go:537] Found 0 rebooted nodes
I0912 22:56:36.161546       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:56:36.161579       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion
I0912 22:57:43.264264       7 operator.go:537] Found 0 rebooted nodes
I0912 22:57:44.467460       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:57:44.467499       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion
I0912 22:58:53.262647       7 operator.go:537] Found 0 rebooted nodes
I0912 22:58:54.461834       7 operator.go:479] Found node "<NODE>" still rebooting, waiting
I0912 22:58:54.461873       7 operator.go:481] Found 1 (of max 1) rebooting nodes; waiting for completion

update-agent

I0912 22:45:12.166367       7 main.go:45] /bin/update-agent running
I0912 22:45:12.166719       7 agent.go:84] Setting info labels
I0912 22:45:28.484271       7 agent.go:89] Checking annotations
I0912 22:45:28.702324       7 agent.go:109] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-in-progress":"false", "container-linux-update.v1.coreos.com/reboot-needed":"false"}
I0912 22:45:29.429884       7 agent.go:144] Waiting for ok-to-reboot from controller...
I0912 22:45:29.430020       7 agent.go:285] Beginning to watch update_engine status
I0912 22:45:29.430647       7 agent.go:237] Updating status
I0912 22:45:29.430666       7 agent.go:249] Indicating a reboot is needed

Node Labels

container-linux-update.v1.coreos.com/before-reboot=true
container-linux-update.v1.coreos.com/group=stable
container-linux-update.v1.coreos.com/id=coreos
container-linux-update.v1.coreos.com/reboot-needed=true
container-linux-update.v1.coreos.com/version=1800.7.0

Node Annotations

container-linux-update.v1.coreos.com/before-reboot-ready="true"
container-linux-update.v1.coreos.com/last-checked-time=1536745054
container-linux-update.v1.coreos.com/new-version=1855.4.0
container-linux-update.v1.coreos.com/reboot-in-progress=false
container-linux-update.v1.coreos.com/reboot-needed=true
container-linux-update.v1.coreos.com/status=UPDATE_STATUS_UPDATED_NEED_REBOOT
sdemos commented 6 years ago

Does closing the issue mean that it's solved? For what it's worth, a quick scan through your logs, it seems like the operator is waiting for a before-reboot hook to run before it triggers a reboot on the machine. Do you have one of those configured in the operator deployment?

Calpicow commented 6 years ago

It is solved. After typing the whole thing out, just realized I made a silly mistake with the quotes around true.

sdemos commented 6 years ago

Awesome! Feel free to open another issue if you run into any other problems.