cloudspannerecosystem / autoscaler

Automatically scale the capacity of your Spanner instances based on their utilization.
Apache License 2.0
87 stars 34 forks source link

autoscaler doesn't work with dataflow load (export/import) #40

Closed oprudkyi closed 1 year ago

oprudkyi commented 3 years ago

Dataflow is considered as medium priority https://cloud.google.com/spanner/docs/cpu-utilization so when there is high Dataflow load (import/export) and low high priority load autoscaler just doesn't work at all. image

while it is possible to fix with custom metric

      "metrics" = [
        {
          "name"                     = "cpu_utilization_total"
          "filter"                   = "metric.type=\"spanner.googleapis.com/instance/cpu/utilization\""
          "regional_threshold"       = 90
          "multi_regional_threshold" = 90
        },

probably would be great if such rule will be added into default set

bgood commented 3 years ago

@oprudkyi Thank you for the feedback.

With the default metrics we were intentionally conservative and only wanting to evaluate the metrics called out in the Cloud Spanner docs. For cases Dataflow (medium priority CPU) and for backups (low priority CPU), I have been considering creating a library of custom metrics with descriptions on the use cases for each metric.

bgood commented 1 year ago

Here are a couple custom metrics that specifically target medium and low priority CPU usage.

Medium CPU usage, useful for autoscaling for Dataflow jobs.

"metrics": [
  {
    "name": "medium_cpu_usage",
    "filter": "metric.type=\"spanner.googleapis.com/instance/cpu/utilization_by_priority\" AND metric.label.priority=\"medium\"",
    "regional_threshold": 40,
    "multi_regional_threshold": 30
  }
]

Low CPU usage, useful for managing jobs like backups and index backfill. The amount of CPU allocated for these tasks is managed by Cloud Spanner in relation to the other CPU priorities. Scaling up for these tasks may have minimal impact particularly with instances with low overall utilization.

"metrics": [
  {
    "name": "low_cpu_usage",
    "filter": "metric.type=\"spanner.googleapis.com/instance/cpu/utilization_by_priority\" AND metric.label.priority=\"low\"",
    "regional_threshold": 40,
    "multi_regional_threshold": 30
  }
]