GoogleCloudPlatform / pubsec-declarative-toolkit

The GCP PubSec Declarative Toolkit is a collection of declarative solutions to help you on your Journey to Google Cloud. Solutions are designed using Config Connector and deployed using Config Controller.
Apache License 2.0
32 stars 28 forks source link

FinOps tracking: Idle cluster costs increase for GKE autopilot cluster from 3 to 5.5 vCores from 8-12/d and logging project up to $13/d from $10/d after 2 weeks - for core-landing-zone #492

Open obriensystems opened 1 year ago

obriensystems commented 1 year ago

https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/wiki/FinOps

TLDR;

update 20231014 - see additional random spikes of $5/day for listCluster audit logs - charging storage on the logging project - these are security logs of 61GB and 171GB for other logs

The networking cost is 5.5% of spend (consisting of Network Intelligence Center and the PSC endpoint) The PSC endpoint (networking) is 2.2% of spend

image

Ongoing cost 16.35/day until sustained use kicks in image 13/d under SU credit image

13.65/d image to 10/d image

A 2nd KCC cluster only has core-landing-zone deployed (logging and dns projects) - not the other 3 core, client packages yet

Logs are increasing - up to $5/d after 2 weeks - all of a sudden 3 days ago

Screenshot 2023-08-24 at 22 57 21

GKE cluster from 8.60 to 12.25/d after 2 weeks

Screenshot 2023-08-24 at 22 53 59

Todo: fmichaelobrien - get rate of change equation (derivative) - for logs/day increase determine why the autopilot cluster nodes increased from 3 to 4.5 under no load or deployments

Screenshot 2023-08-24 at 22 58 04 Screenshot 2023-08-24 at 22 59 25 Screenshot 2023-08-24 at 22 59 45
fmichaelobrien commented 1 year ago

Duet AI discussions (palm 2)

as an excellent krm developer, what are the advantages and disadvantages of running an autopilot GKE cluster for a config controller. Specifically I need to know how to avoid GKE cluster scaling

Autopilot clusters are a great option for running config controllers because they are fully managed and scale automatically. This means that you don't have to worry about provisioning or scaling nodes, and you can focus on developing and deploying your config controller. However, there are some disadvantages to running an autopilot GKE cluster for a config controller. First, autopilot clusters are more expensive than standard clusters. Second, autopilot clusters don't offer as much control over the underlying infrastructure. For example, you can't specify the type of nodes or the size of the cluster. If you're considering running an autopilot GKE cluster for a config controller, you should weigh the advantages and disadvantages carefully to make sure that it's the right choice for you. Here are some tips for avoiding GKE cluster scaling: Use Horizontal Pod Autoscaling (HPA) to automatically scale the number of replicas of your Pods based on demand. Use Vertical Pod Autoscaling (VPA) to automatically scale the resources allocated to your Pods based on their usage. Use Cluster Autoscaler to automatically add or remove nodes from your cluster based on demand. Use Resource quotas to limit the amount of resources that your Pods can use. Use Pod Disruption Budgets (PDBs) to limit the number of Pods that can be disrupted at any given time. By following these tips, you can help to avoid GKE cluster scaling and ensure that your config controller runs smoothly.

fmichaelobrien commented 1 year ago

Check the issue is not an exit of the monthly free allocation for a single GKE cluster also investigate using a standard cluster to avoid scaling over 3 nodes see https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/issues/539

obriensystems commented 1 year ago

2 new GKE clusters up for 3 days without any KRM workload yet - for base finops $10/day baseline

image

obriensystems commented 1 year ago

Review sink volume of 61GB = security log sink, 171GB for other logs https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/blob/main/solutions/client-landing-zone/logging-project/cloud-logging-bucket.yaml

Screenshot 2023-10-16 at 10 21 40
obriensystems commented 9 months ago

KCC management project

Cluster with 1 package clz installed

Screenshot 2024-02-16 at 11 08 09

core-landing-zone

client-setup

client-landing-zone

Cluster with 4 packages clz + 4 client

Screenshot 2024-02-16 at 11 56 05