Closed LindseySaari closed 1 year ago
Currently have ported over what we see in BRD... but needs fine tuning for EKS.
These are the current BRD CPU usage percentage rates for Prod, Staging and Sandbox. We will want to compare to EKS as we scale up the weighted deployments and compare to defined autoscaling defaults.
According to the AWS Docs "A good, general rule for EC2 instances is that if your maximum CPU and memory usage is less than 40% over a four-week period, you can safely cut the machine in half."
@RachalCassity to sync with @rmtolmach on HPA autoscaling during 1:1
Could Goldilocks be used for determining the limits? Here is a really old ticket where I was investigating deployment requests and limits in EKS for vets-api: https://github.com/department-of-veterans-affairs/va.gov-team/issues/39691
Edit: never mind on ☝️ that, Goldilocks is used for VERTICAL scaling, not horizontal.
With 100% traffic going to dev, should be pretty accurate. Need to start working on load testing.
On hold until we turn up the dial on staging + higher environments.
Reminder: Attach PRs to scale up pods for future traffic.
Rachal going to make a PR for staging
Rachal created the PRs for SB and production (as draft currently). Removed the DB migrate job from SB and production (this will remain a manual process in Jenkins).
PRs still in draft mode and ready to go when we are.
Now that we have increased number of pods in staging, the pods are not all cycling through. @RachalCassity to look at the config and pull in Kyle if needed. Percentages seem to be the way to go.
@RachalCassity Do you want to update this ticket with the changes made + discussed in this thread please?
@RachalCassity Do you want to update this ticket with the changes made + discussed in this thread please?
Current BRD request rates: https://vagov.ddog-gov.com/dashboard/b8k-uy2-fkm?from_ts=1677259846726&to_ts=1677263446726&live=true
Description
The current vets-api EKS autoscaling (HPA) is based on what was defined for BRD (threads + worker count). We should keep an eye on metrics to see if we are under or over utilizing pods. E.g. The min/max thresholds. See Datadog
EKS Dashboards: