Define an RBAC ClusterRole for update-operator and update-agent
Create a separate namespace "reboot-coordinator" for components
Previously, CLUO would run in the kube-system namespace, where it had admin privilege to do anything on most clusters.
Testing
I've used these examples to do some simulated cluster upgrade tests. I think I've included each permission we need, but its hard to be certain.
update-operator
I1004 20:47:06.408558 1 main.go:77] /bin/update-operator running
I1004 20:47:06.504233 1 leaderelection.go:179] attempting to acquire leader lease...
I1004 20:48:54.377845 1 leaderelection.go:189] successfully acquired lease reboot-coordinator/container-linux-update-operator-lock
I1004 20:48:54.472968 1 operator.go:517] Found 0 rebooted nodes
I1004 20:49:24.518023 1 operator.go:517] Found 0 rebooted nodes
I1004 20:49:54.582655 1 operator.go:517] Found 0 rebooted nodes
I1004 20:50:24.639663 1 operator.go:517] Found 0 rebooted nodes
I1004 20:50:54.681210 1 operator.go:517] Found 0 rebooted nodes
I1004 20:51:24.723948 1 operator.go:517] Found 0 rebooted nodes
I1004 20:51:54.765549 1 operator.go:517] Found 0 rebooted nodes
I1004 20:52:24.805565 1 operator.go:517] Found 0 rebooted nodes
I1004 20:52:54.847582 1 operator.go:517] Found 0 rebooted nodes
I1004 20:53:24.889431 1 operator.go:517] Found 0 rebooted nodes
I1004 20:53:54.929985 1 operator.go:517] Found 0 rebooted nodes
I1004 20:54:24.969771 1 operator.go:517] Found 0 rebooted nodes
I1004 20:54:55.009982 1 operator.go:517] Found 0 rebooted nodes
I1004 20:55:25.060635 1 operator.go:517] Found 0 rebooted nodes
I1004 20:55:55.102140 1 operator.go:517] Found 0 rebooted nodes
I1004 20:56:25.148750 1 operator.go:517] Found 0 rebooted nodes
I1004 20:56:55.191354 1 operator.go:517] Found 0 rebooted nodes
I1004 20:57:25.257605 1 operator.go:517] Found 0 rebooted nodes
I1004 20:57:55.326448 1 operator.go:517] Found 0 rebooted nodes
I1004 20:58:25.393261 1 operator.go:517] Found 0 rebooted nodes
I1004 20:58:55.437934 1 operator.go:517] Found 0 rebooted nodes
I1004 20:59:25.597686 1 operator.go:517] Found 0 rebooted nodes
I1004 20:59:25.612135 1 operator.go:484] Found 1 nodes that need a reboot
I1004 20:59:55.670427 1 operator.go:517] Found 0 rebooted nodes
I1004 20:59:55.853637 1 operator.go:459] Found node "ip-10-0-47-77" still rebooting, waiting
I1004 20:59:55.853653 1 operator.go:461] Found 1 (of max 1) rebooting nodes; waiting for completion
...
I1004 21:02:15.497952 1 operator.go:517] Found 0 rebooted nodes
I1004 21:02:45.535740 1 operator.go:517] Found 0 rebooted nodes
I1004 21:03:15.575713 1 operator.go:517] Found 0 rebooted nodes
update-agent
I1004 20:56:47.848041 1 main.go:42] /bin/update-agent running
I1004 20:56:47.848152 1 agent.go:79] Setting info labels
I1004 20:56:47.875615 1 agent.go:90] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-needed":"false", "container-linux-update.v1.coreos.com/reboot-in-progress":"false"}
I1004 20:56:47.953218 1 agent.go:101] Marking node as schedulable
I1004 20:56:47.992487 1 agent.go:111] Waiting for ok-to-reboot from controller...
I1004 20:56:47.992637 1 agent.go:220] Beginning to watch update_engine status
I1004 20:56:47.993483 1 agent.go:175] Updating status
I1004 20:59:06.247778 1 agent.go:175] Updating status
I1004 20:59:06.247802 1 agent.go:185] Indicating a reboot is needed
I1004 20:59:55.685095 1 agent.go:125] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-in-progress":"true"}
I1004 20:59:55.725304 1 agent.go:137] Marking node as unschedulable
I1004 20:59:55.740044 1 agent.go:142] Getting pod list for deletion
I1004 20:59:55.766850 1 agent.go:151] Deleting 3 pods
I1004 20:59:55.766934 1 agent.go:154] Terminating pod "container-linux-update-operator-3421875347-454lm"...
I1004 20:59:55.784403 1 agent.go:154] Terminating pod "iperf"...
I1004 20:59:55.833088 1 agent.go:154] Terminating pod "container-linux-update-operator-3421875347-l7611"...
I1004 20:59:55.886383 1 agent.go:161] Node drained, rebooting
....
I1004 21:02:19.906335 1 main.go:42] /bin/update-agent running
I1004 21:02:19.906381 1 agent.go:79] Setting info labels
I1004 21:02:19.944694 1 agent.go:90] Setting annotations map[string]string{"container-linux-update.v1.coreos.com/reboot-in-progress":"false", "container-linux-update.v1.coreos.com/reboot-needed":"false"}
I1004 21:02:19.971217 1 agent.go:101] Marking node as schedulable
I1004 21:02:19.979572 1 agent.go:111] Waiting for ok-to-reboot from controller...
I1004 21:02:19.979764 1 agent.go:220] Beginning to watch update_engine status
I1004 21:02:19.980468 1 agent.go:175] Updating status
Later: It would be possible to define separate service accounts for the update-operator and update-agent, but to do this incrementally, let's start by just defining a namespace with the appropriate access for both.
Closes: #128
update-operator
andupdate-agent
Testing
I've used these examples to do some simulated cluster upgrade tests. I think I've included each permission we need, but its hard to be certain.
update-operator
update-agent
Later: It would be possible to define separate service accounts for the update-operator and update-agent, but to do this incrementally, let's start by just defining a namespace with the appropriate access for both. Closes: #128