cloudspannerecosystem / autoscaler

Automatically scale the capacity of your Spanner instances based on their utilization.
Apache License 2.0
86 stars 33 forks source link

Setting custom threshold computes bad range if use string value #318

Open alexlo03 opened 2 months ago

alexlo03 commented 2 months ago

Background

I have the autoscaler (v2.0.1) working in nonproduction. For Production I want to over-provision to start with, therefore I want to set a CPU threshold lower than the recommended threshold of 65% for a regional instance. I want to start at 40% (or maybe lower) and work up from there slowly.

Bug

Just setting a threshold on the existing metrics breaks in a bizarre way if you specify values as strings.

tf-poller-function high_priority_cpu = 0.583039742903243, threshold = 40, margin = 5, location = us-central1

tf-scaler-function high_priority_cpu=0.583039742903243%, BELOW the range [35%-100%] => suggesting to scale from 300 to 100 PROCESSING_UNITS.

Original Poller Configuration ``` [ { "instanceId": "redacted", "maxSize": "1000", "minSize": "100", "projectId": "redacted", "scalerPubSubTopic": "projects/redacted/topics/scaler-topic", "scalingMethod": "LINEAR", "stateDatabase": { "databaseId": "spanner-autoscaler-state", "instanceId": "spanner-autoscaler-state", "name": "spanner" }, "units": "PROCESSING_UNITS" } ] ```
Updated Poller Configuration ``` [ { "instanceId": "redacted", "maxSize": "1000", "metrics": [ { "name": "high_priority_cpu", "regional_threshold": "40" }, { "name": "rolling_24_hr", "regional_threshold": "50" }, { "name": "storage" } ], "minSize": "100", "projectId": "redacted", "scalerPubSubTopic": "projects/redacted/topics/scaler-topic", "scalingMethod": "LINEAR", "stateDatabase": { "databaseId": "spanner-autoscaler-state", "instanceId": "spanner-autoscaler-state", "name": "spanner" }, "units": "PROCESSING_UNITS" } ] ```

JS math goes wat if you specify any strings:

Screenshot 2024-06-18 at 16 34 26

I have a patch to fix this I will make the PR tomorrow but submitting this issue now.

alexlo03 commented 2 months ago

Looking into this more, there's two options:

A. Hack solution: convert strings to numbers in getRange() B. Better solution: parse/validate on JSON load

A is terrible but B is bigger than I can do right now.

nielm commented 1 month ago

pr #338 adds schema validation of the config file which will check for things like this (and for mis-spelled properties etc) along with a command line validator.

This will be in v3 as it is potentially a breaking change for anyone who has specified numeric params as strings, or incorrectly spelled properties, as these will now be rejected as invalid configurations.