MnrGreg / kubectl-node-restart

Krew plugin to restart Kubernetes Nodes sequentially and gracefully
Other
104 stars 11 forks source link

Missing force drain for OpenShift nodes #9

Closed lukastopiarz closed 2 years ago

lukastopiarz commented 2 years ago

Hi, I suppose kubectl --force parameter is missing in the case of OpenShift cluster with "non standard" PODs. Not to be confused with node-restart --force parameter for restart without drain.

There should be internally kubectl drain <node> --force --ignore-daemonsets --delete-emptydir-data --ignore-daemonsets and --delete-emptydir-data are also helpfull

kubectl node-restart -l node-role.kubernetes.io/worker=
Targeting selective nodes:
 openshift-b6scc-worker-0
 openshift-b6scc-worker-1
 openshift-b6scc-worker-2
 openshift-b6scc-worker-3

Draining node openshift-b6scc-worker-0...
Flag --delete-local-data has been deprecated, This option is deprecated and will be deleted. Use --delete-emptydir-data.
node/openshift-b6scc-worker-0 already cordoned
error: unable to drain node "openshift-b6scc-worker-0" due to error:cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-marketplace/certified-operators-kckxd, openshift-marketplace/community-operators-242l6, openshift-marketplace/ibm-operator-catalog-xwlf5, openshift-marketplace/redhat-marketplace-tlxwd, openshift-marketplace/redhat-operators-xqj2s, continuing command...
There are pending nodes to be drained:
 openshift-b6scc-worker-0
cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-marketplace/certified-operators-kckxd, openshift-marketplace/community-operators-242l6, openshift-marketplace/ibm-operator-catalog-xwlf5, openshift-marketplace/redhat-marketplace-tlxwd, openshift-marketplace/redhat-operators-xqj2s
Initiating node restart job on openshift-b6scc-worker-0...
job.batch/node-restart-wr9rh created
Waiting for restart job to complete on node openshift-b6scc-worker-0...
usage: sleep seconds
openshift-b6scc-worker-0 - 10 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 20 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 30 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 40 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 50 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 60 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 70 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 80 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 90 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 100 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 110 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 120 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 130 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 140 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 150 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 160 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 170 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 180 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 190 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 200 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 210 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 220 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 230 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 240 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 250 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 260 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 270 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 280 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 290 seconds
usage: sleep seconds
openshift-b6scc-worker-0 - 300 seconds
Error: Restart job did not complete within 300 seconds
MnrGreg commented 2 years ago

added --force as default fixed via b9cdfb0ef5247805c022ef80804bb43c1bfe7331

lukastopiarz commented 2 years ago

Force parameter added only to dry-run, still missing for regular run. @MnrGreg

MnrGreg commented 2 years ago

thanks @lukastopiarz fixed in v1.0.6

lukastopiarz commented 2 years ago

@MnrGreg Unfortunately there is still some problem.

➜  ~ kubectl node-restart  -l node-role.kubernetes.io/worker
Targeting selective nodes:
 openshift-b6scc-worker-0
 openshift-b6scc-worker-1
 openshift-b6scc-worker-2
 openshift-b6scc-worker-3

Draining node openshift-b6scc-worker-0...
node/openshift-b6scc-worker-0 cordoned
error: unable to drain node "openshift-b6scc-worker-0" due to error:cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-marketplace/opencloud-operators-6xkvg, openshift-marketplace/redhat-marketplace-zn9mc, continuing command...
There are pending nodes to be drained:
 openshift-b6scc-worker-0
cannot delete Pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet (use --force to override): openshift-marketplace/opencloud-operators-6xkvg, openshift-marketplace/redhat-marketplace-zn9mc
Initiating node restart job on openshift-b6scc-worker-0...
job.batch/node-restart-s7s6n created
Waiting for restart job to complete on node openshift-b6scc-worker-0...
openshift-b6scc-worker-0 - 10 seconds
openshift-b6scc-worker-0 - 20 seconds
openshift-b6scc-worker-0 - 30 seconds
openshift-b6scc-worker-0 - 40 seconds
openshift-b6scc-worker-0 - 50 seconds
openshift-b6scc-worker-0 - 60 seconds
openshift-b6scc-worker-0 - 70 seconds
openshift-b6scc-worker-0 - 80 seconds
openshift-b6scc-worker-0 - 90 seconds
openshift-b6scc-worker-0 - 100 seconds
openshift-b6scc-worker-0 - 110 seconds
openshift-b6scc-worker-0 - 120 seconds
openshift-b6scc-worker-0 - 130 seconds
openshift-b6scc-worker-0 - 140 seconds
openshift-b6scc-worker-0 - 150 seconds
openshift-b6scc-worker-0 - 160 seconds
openshift-b6scc-worker-0 - 170 seconds
openshift-b6scc-worker-0 - 180 seconds
openshift-b6scc-worker-0 - 190 seconds
openshift-b6scc-worker-0 - 200 seconds
openshift-b6scc-worker-0 - 210 seconds
openshift-b6scc-worker-0 - 220 seconds
openshift-b6scc-worker-0 - 230 seconds
openshift-b6scc-worker-0 - 240 seconds
openshift-b6scc-worker-0 - 250 seconds
openshift-b6scc-worker-0 - 260 seconds
openshift-b6scc-worker-0 - 270 seconds
openshift-b6scc-worker-0 - 280 seconds
openshift-b6scc-worker-0 - 290 seconds
openshift-b6scc-worker-0 - 300 seconds
Error: Restart job did not complete within 300 seconds
➜  ~ kubectl krew info  node-restart
NAME: node-restart
INDEX: default
URI: https://github.com/MnrGreg/kubectl-node-restart/releases/download/v1.0.6/v1.0.6.zip
SHA256: 50b5388448c61fbd41ce6069bb1242af12fcfb13da2cc1ef1d11b8406ff48d9f
VERSION: v1.0.6
HOMEPAGE: https://github.com/mnrgreg/kubectl-node-restart
DESCRIPTION:
This plugin performs a sequential, rolling restart of selected nodes by first
draining each node, then running a Kubernetes Job to reboot each node, and
finally uncordoning each node when Ready.
CAVEATS:
\
 | Execution of this plugin requires Kubernetes cluster-admin Rolebindings
 | and the ability to schedule Privileged Pods.
/