Run a simulation with 5,000-10,000 data points to stress test the latest changes. There have been some features that have been added to the scale down that gracefully terminate pods and stop the queuing system from receiving more jobs.
Steps.
1 ) Create the AWS k8s cluster of appropriate size ( the e.g. is too small for max-nodes)
2 ) Define max workers to according to number of VM nodes. Figure 3-1? try for 300 workers for 100 nodes. Setting a max of 400 would be fine as those would just be stuck in pending state as not enough resources to schedule it.
3 ) Run analysis and see if the scale out and scale down is successful.
Run a simulation with 5,000-10,000 data points to stress test the latest changes. There have been some features that have been added to the scale down that gracefully terminate pods and stop the queuing system from receiving more jobs.
Steps.
1 ) Create the AWS k8s cluster of appropriate size ( the e.g. is too small for max-nodes) 2 ) Define max workers to according to number of VM nodes. Figure 3-1? try for 300 workers for 100 nodes. Setting a max of 400 would be fine as those would just be stuck in pending state as not enough resources to schedule it. 3 ) Run analysis and see if the scale out and scale down is successful.