NREL / openstudio-server-helm

Helm charts for Kubernetes deployment of OpenStudio Server.
Other
10 stars 10 forks source link

Helm Stress Test Using Large Simulation 10k #36

Closed tijcolem closed 7 months ago

tijcolem commented 2 years ago

Run a simulation with 5,000-10,000 data points to stress test the latest changes. There have been some features that have been added to the scale down that gracefully terminate pods and stop the queuing system from receiving more jobs.

Steps.

1 ) Create the AWS k8s cluster of appropriate size ( the e.g. is too small for max-nodes) 2 ) Define max workers to according to number of VM nodes. Figure 3-1? try for 300 workers for 100 nodes. Setting a max of 400 would be fine as those would just be stuck in pending state as not enough resources to schedule it. 3 ) Run analysis and see if the scale out and scale down is successful.

brianlball commented 7 months ago

we have run +3.5M sims so far for 179d using the https://github.com/NREL/openstudio-server-helm/tree/179d will merge it into a large scale branch soon