Adds a new DWOC field for configuring the DWO webhook server: config.webhook. '
3 configuration options are provided for specifying the webhook server replica count, webhook server pod tolerations, and webhook server nodeSelectors
Sets the default webhook server replica count to 2.
Since the webhook server is used for all devworkspaces, its configuration options only take effect when they are specified in the global DWOC.
Additionally, since the devworkspace-controller-manager is responsible for creating the webhook deployment, the devworkspace-controller-manager pod must be terminated (and automatically re-created by the deployment) for changes to the webhook configuration to take effect.
I recommend following the testing steps below in order, as they were written with this assumption in mind.
To set up for testing, you'll need a multi-node cluster. IIRC, requesting a gcp cluster from cluster bot should provide a multi-node cluster (e.g. launch 4.16 gcp). Minikube can be configured to have multiple nodes with minikube start --nodes <node-count>, e.g. minikube start --nodes 4 && minikube addons enable ingress.
I've pushed a build of DWO with the changes from this PR to quay.io/aobuchow/devworkspace-controller:configurable-webhook for ease of testing.
Once you have your multi-node cluster running with DWO installed, retrieve the list of nodes on the cluster with kubectl get nodes:
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 8m10s v1.30.0
minikube-m02 Ready <none> 7m49s v1.30.0
minikube-m03 Ready <none> 7m36s v1.30.0
minikube-m04 Ready <none> 7m23s v1.30.0
Verifying nodeSelector
Verify which node the devworkspace-webhook-server is currently running on:
Do a kubectl get pod -n $NAMESPACE to find the webhook pod names.
Then a kubectl get pod devworkspace-webhook-server... -n $NAMESPACE -o jsonpath='{.spec.nodeName}' for each webhook pod.
In my case, the pods were scheduled onto nodes minikube-m03 and minikube-m04
Add a label to the node which we want the webhook to be deployed: kubectl patch node <node-name> --type='merge' --patch '{"metadata": {"labels": {"my-label": "my-value"}}}'
Modify the webhook configuration in the global DWOC to add a nodeSelector corresponding to the node label we just added: kubectl edit dwoc -n $NAMESPACE
Terminate the devworkspace-controller-manager pod so that it modifies the webhook deployment based on the new webhook configuration in the DWOC: kubectl delete pod devworkspace-controller-manager-... -n $NAMESPACE
Wait for the old webhook pods to terminate and for the new pods to start up successfully
Verify that the new webhook pods were scheduled on the correct node which had your label applied: kubectl get pod devworkspace-webhook-server... -n $NAMESPACE -o jsonpath='{.spec.nodeName}' for each webhook pod.
Verifying tolerations
Taint the node that you applied a label to in the previous step: kubectl taint nodes <name-of-node-with-label> key1=value1:NoExecute. All pods running on the tainted node will be evicted since we applied the NoExecute taint.
The webhook deployment will create pods scheduled onto other available/non-tainted nodes to fulfill the desired number of webhook replicas. However, since we have a nodeSelector targeting the tainted node, an additional webhook-server pod will remain in a pending state as it cannot be scheduled onto the tainted node.
Modify DWOC to add a toleration that will allow the webhook server to be scheduled on the tainted node, and kill the devworkspace-controller-manager pod to modify webhook deployment:
You should see the webhook server pod that was previously in a pending state enter the running state. The 2 other webhook server replica pods will terminate and once will get recreated so that they are scheduled on the node with desired nodeSelector. Afterwards, there will only be 2 webhook server pods remaining on the cluster, and they should be running on the desired node.
Verifying replicas
Modify the DWOC to increase the number of webhook server replicas:
Kill the devworkspace-controller-manager pod to have the devworkspace webhook server deployment updated.
Ensure the devworkspace webhook server deployment has the correct number of replicas: `kubectl get deployment devworkspace-webhook-server -n $NAMESPACE -o jsonpath='{.spec.replicas}'
Optional: try setting the number of webhook server replicas to 0 or a negative number. The CR validation should fail and prevent you from making the edit.
Config logging
When the DWOC webhook's configuration contains nodeSelectors and tolerations, the output resembles the following:
Updated config to [routing.clusterHostSuffix=192.168.49.2.nip.io,webhook.nodeSelectors=[my-label=my-value, my-label2=my-value2],webhook.tolerations=[&Toleration{Key:key1,Operator:Equal,Value:value1,Effect:NoExecute,TolerationSeconds:nil,}, &Toleration{Key:key2,Operator:Equal,Value:value2,Effect:NoExecute,TolerationSeconds:nil,}],enableExperimentalFeatures=true]
The formatting for Tolerations is a bit awkward but using the Kubernetes implementation of String() seems sufficient, rather than re-implementing it.
PR Checklist
[ ] E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
[ ] v8-devworkspace-operator-e2e: DevWorkspace e2e test
[ ] v8-che-happy-path: Happy path for verification integration with Che
Needs approval from an approver in each of these files:
- ~~[OWNERS](https://github.com/devfile/devworkspace-operator/blob/main/OWNERS)~~ [AObuchow]
Approvers can indicate their approval by writing `/approve` in a comment
Approvers can cancel approval by writing `/approve cancel` in a comment
What does this PR do?
config.webhook
. 'Since the webhook server is used for all devworkspaces, its configuration options only take effect when they are specified in the global DWOC.
Additionally, since the devworkspace-controller-manager is responsible for creating the webhook deployment, the devworkspace-controller-manager pod must be terminated (and automatically re-created by the deployment) for changes to the webhook configuration to take effect.
What issues does this PR fix or reference?
Fixes https://github.com/devfile/devworkspace-operator/issues/1272
Is it tested? How?
I recommend following the testing steps below in order, as they were written with this assumption in mind.
To set up for testing, you'll need a multi-node cluster. IIRC, requesting a gcp cluster from cluster bot should provide a multi-node cluster (e.g.
launch 4.16 gcp
). Minikube can be configured to have multiple nodes withminikube start --nodes <node-count>
, e.g.minikube start --nodes 4 && minikube addons enable ingress
.I've pushed a build of DWO with the changes from this PR to
quay.io/aobuchow/devworkspace-controller:configurable-webhook
for ease of testing.Once you have your multi-node cluster running with DWO installed, retrieve the list of nodes on the cluster with
kubectl get nodes
:Verifying nodeSelector
Verify which node the devworkspace-webhook-server is currently running on: Do a
kubectl get pod -n $NAMESPACE
to find the webhook pod names. Then akubectl get pod devworkspace-webhook-server... -n $NAMESPACE -o jsonpath='{.spec.nodeName}'
for each webhook pod. In my case, the pods were scheduled onto nodesminikube-m03
andminikube-m04
Add a label to the node which we want the webhook to be deployed:
kubectl patch node <node-name> --type='merge' --patch '{"metadata": {"labels": {"my-label": "my-value"}}}'
Modify the webhook configuration in the global DWOC to add a nodeSelector corresponding to the node label we just added:
kubectl edit dwoc -n $NAMESPACE
Terminate the devworkspace-controller-manager pod so that it modifies the webhook deployment based on the new webhook configuration in the DWOC:
kubectl delete pod devworkspace-controller-manager-... -n $NAMESPACE
Wait for the old webhook pods to terminate and for the new pods to start up successfully
Verify that the new webhook pods were scheduled on the correct node which had your label applied:
kubectl get pod devworkspace-webhook-server... -n $NAMESPACE -o jsonpath='{.spec.nodeName}'
for each webhook pod.Verifying tolerations
kubectl taint nodes <name-of-node-with-label> key1=value1:NoExecute
. All pods running on the tainted node will be evicted since we applied the NoExecute taint.The webhook deployment will create pods scheduled onto other available/non-tainted nodes to fulfill the desired number of webhook replicas. However, since we have a nodeSelector targeting the tainted node, an additional webhook-server pod will remain in a pending state as it cannot be scheduled onto the tainted node.
You should see the webhook server pod that was previously in a pending state enter the running state. The 2 other webhook server replica pods will terminate and once will get recreated so that they are scheduled on the node with desired nodeSelector. Afterwards, there will only be 2 webhook server pods remaining on the cluster, and they should be running on the desired node.
Verifying replicas
Config logging
When the DWOC webhook's configuration contains nodeSelectors and tolerations, the output resembles the following:
The formatting for Tolerations is a bit awkward but using the Kubernetes implementation of String() seems sufficient, rather than re-implementing it.
PR Checklist
/test v8-devworkspace-operator-e2e, v8-che-happy-path
to trigger)v8-devworkspace-operator-e2e
: DevWorkspace e2e testv8-che-happy-path
: Happy path for verification integration with Che