Closed qiliRedHat closed 3 years ago
@rsevilla87 Please take a look at the proposal and let me know your thought, if you agree, I can open a PR.
I will let @rsevilla87 make the final call on proposed solution, but one of the ways I tackled this issue elsewhere was to scale down the replica to 0 explicitly, then make the deployment update, and scale it back up.
So before running the line,
oc set probe -n openshift-ingress --liveness --period-seconds=$((RUNTIME * 2)) deploy/router-default
you would run
oc scale -n openshift-ingress deploy/router-default --replicas 0
and then run
oc scale -n openshift-ingress deploy/router-default --replicas 1
so together it would look like:
oc scale -n openshift-ingress deploy/router-default --replicas 0
oc set probe -n openshift-ingress --liveness --period-seconds=$((RUNTIME * 2)) deploy/router-default
oc scale -n openshift-ingress deploy/router-default --replicas 1
Thanks for reporting this @qiliRedHat , this is a corner case I didn't consider when I coded this benchmark. I like @kedark3 , simple and effective.
Problem: The target cluster has a single worker node and only one router. I set NUMBER_OF_ROUTERS=1, NODE_SELECTOR={node-role.kubernetes.io/worker: }, the router-perf-v2 can't work. Error logs:
Analysis of the cause:
After running this line of code https://github.com/cloud-bulldozer/e2e-benchmarking/blob/9da00a2f270cffe5e3314360391656ef6d2f46cb/workloads/router-perf-v2/common.sh#L61
A new replica set
router-default-d9888dff8
is created to make the change to a new pod.While because of the anti-affinity rule, the new pod can not be scheduled.
Then when the following code is run https://github.com/cloud-bulldozer/e2e-benchmarking/blob/9da00a2f270cffe5e3314360391656ef6d2f46cb/workloads/router-perf-v2/common.sh#L63-L64
error happens
Scale up trys to work on the replica set
router-default-d9888dff8
, which is not READY.Proposal: To make the router-perf-v2 work on single worker node cluster. One proposal could be adding a logic when NUMBER_OF_ROUTERS is set to -1, the tune_liveness_probe and enable_ingress_operator functions are disabled.