Closed oprudkyi closed 1 year ago
@oprudkyi Thank you for the feedback.
With the default metrics we were intentionally conservative and only wanting to evaluate the metrics called out in the Cloud Spanner docs. For cases Dataflow (medium priority CPU) and for backups (low priority CPU), I have been considering creating a library of custom metrics with descriptions on the use cases for each metric.
Here are a couple custom metrics that specifically target medium and low priority CPU usage.
Medium CPU usage, useful for autoscaling for Dataflow jobs.
"metrics": [
{
"name": "medium_cpu_usage",
"filter": "metric.type=\"spanner.googleapis.com/instance/cpu/utilization_by_priority\" AND metric.label.priority=\"medium\"",
"regional_threshold": 40,
"multi_regional_threshold": 30
}
]
Low CPU usage, useful for managing jobs like backups and index backfill. The amount of CPU allocated for these tasks is managed by Cloud Spanner in relation to the other CPU priorities. Scaling up for these tasks may have minimal impact particularly with instances with low overall utilization.
"metrics": [
{
"name": "low_cpu_usage",
"filter": "metric.type=\"spanner.googleapis.com/instance/cpu/utilization_by_priority\" AND metric.label.priority=\"low\"",
"regional_threshold": 40,
"multi_regional_threshold": 30
}
]
Dataflow is considered as medium priority https://cloud.google.com/spanner/docs/cpu-utilization so when there is high Dataflow load (import/export) and low high priority load autoscaler just doesn't work at all.
while it is possible to fix with custom metric
probably would be great if such rule will be added into default set