Open obriensystems opened 1 year ago
Duet AI discussions (palm 2)
as an excellent krm developer, what are the advantages and disadvantages of running an autopilot GKE cluster for a config controller. Specifically I need to know how to avoid GKE cluster scaling
Autopilot clusters are a great option for running config controllers because they are fully managed and scale automatically. This means that you don't have to worry about provisioning or scaling nodes, and you can focus on developing and deploying your config controller. However, there are some disadvantages to running an autopilot GKE cluster for a config controller. First, autopilot clusters are more expensive than standard clusters. Second, autopilot clusters don't offer as much control over the underlying infrastructure. For example, you can't specify the type of nodes or the size of the cluster. If you're considering running an autopilot GKE cluster for a config controller, you should weigh the advantages and disadvantages carefully to make sure that it's the right choice for you. Here are some tips for avoiding GKE cluster scaling: Use Horizontal Pod Autoscaling (HPA) to automatically scale the number of replicas of your Pods based on demand. Use Vertical Pod Autoscaling (VPA) to automatically scale the resources allocated to your Pods based on their usage. Use Cluster Autoscaler to automatically add or remove nodes from your cluster based on demand. Use Resource quotas to limit the amount of resources that your Pods can use. Use Pod Disruption Budgets (PDBs) to limit the number of Pods that can be disrupted at any given time. By following these tips, you can help to avoid GKE cluster scaling and ensure that your config controller runs smoothly.
Check the issue is not an exit of the monthly free allocation for a single GKE cluster also investigate using a standard cluster to avoid scaling over 3 nodes see https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/539
2 new GKE clusters up for 3 days without any KRM workload yet - for base finops $10/day baseline
Review sink volume of 61GB = security log sink, 171GB for other logs https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/client-landing-zone/logging-project/cloud-logging-bucket.yaml
Cluster with 1 package clz installed
Cluster with 4 packages clz + 4 client
https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/wiki/FinOps
TLDR;
update 20231014 - see additional random spikes of $5/day for listCluster audit logs - charging storage on the logging project - these are security logs of 61GB and 171GB for other logs
561
Update: 20231007
The networking cost is 5.5% of spend (consisting of Network Intelligence Center and the PSC endpoint) The PSC endpoint (networking) is 2.2% of spend
Ongoing cost 16.35/day until sustained use kicks in 13/d under SU credit
13.65/d to 10/d
A 2nd KCC cluster only has core-landing-zone deployed (logging and dns projects) - not the other 3 core, client packages yet
Logs are increasing - up to $5/d after 2 weeks - all of a sudden 3 days ago
GKE cluster from 8.60 to 12.25/d after 2 weeks
Todo: fmichaelobrien - get rate of change equation (derivative) - for logs/day increase determine why the autopilot cluster nodes increased from 3 to 4.5 under no load or deployments