cloudspannerecosystem / autoscaler

Automatically scale the capacity of your Spanner instances based on their utilization.
Apache License 2.0
86 stars 33 forks source link

Aligner / Reducer functions incorrect for high priority CPU #31

Closed BrianChukwu-Smith closed 1 year ago

BrianChukwu-Smith commented 3 years ago

Currently the High priority CPU aligner is align_max and reducer is reduce_sum. According to the documentation you point to in GCP monitoring alerts for spanner (https://cloud.google.com/spanner/docs/monitoring-cloud#high-priority-cpu) the high priority cpu aligner should be mean, while the aggregator (which I'm understanding to mean the reducer) should be max. This is different than the rolling 24 hour version.

The impact of this is that (in my case) I am seeing scale up events in my multi-region spanner instance when two locations sum to greater than 45% (which I don't think is ideal, if I am understanding those numbers).

davidcueva commented 1 year ago

@bgood Please verify this is solved with PR #100

bgood commented 1 year ago

This issue is resolved with PR #100.

We still use the max, so that the peak utilization is being captured, however, that PR includes a group by location, so that max of any one region is being used to decide if scaling action should be taken. This allows scaling to happen to support read heavy events impacting a read-only region and read/write heavy activity to also trigger scaling on leader regions.

Closing this issue, please feel free to reopen if you are still seeing this problem.