Autoscaling of the high availability service

NShaforostov commented 2 years ago

Background

As separate Cloud Pipeline deployments may face periodically with large workload peaks, it would be useful to implement an autoscaling of the system nodes (HA service) - to allow scaling up and down of the service according to the actual workload.

Approach

We shall monitor the state of the system API instances (at least, their RAM and/or CPUs consumption). HA service shall have the minimum limit of instances to run itself. If the consumption exceeds some predefined threshold during some time - new instances on the system needs shall be launched (i.e. HA service shall be scaled up). If the workload is subsided, then additional instances shall be stopped (i.e. HA service shall be scaled down - but to not less than predefined minimal limit of instances).

I suppose that described behavior shall be managed by some new system preferences, e.g.:

preference that enables HA autoscaling
minimal number of instances
thresholds for resources, after exceeding which the autoscaling action shall be performed
count of instances in each step of the autoscaling action
frequency of the measurement of the resources consumption (time period between two measurements)

Additionally

Each HA service autoscaled action (scaling up or down) should be accompanied by a corresponding email to the admin
Add and show at the GUI (Cluster state page) new labels for the HA service nodes:
- each label shall show the state of the current running service instance
- label shall be colorizing according to the current instance consumption, for example - if the consumption is less than 50% - label shall be colorized in green, between 50% and 90% - label shall be colorized in orange, when consumption is over 90% - in red.
Add a new filter at the GUI (Cluster state page) to show only system service instances:
- by this filter only system service instances shall be shown in the nodes list
- system service instances shall not be displayed when the filter "No run id" is selected

tcibinan commented 2 years ago

Goals

From the technical point of view we would like to achieve the following goals:

Kubernetes cluster shall be autoscaled based on specific deployment utilization.
Autoscaling shall not depend on other Cloud Pipeline services.
Autoscaling shall be expandable in terms of autoscaling triggers and target deployments.
Autoscaling shall support independent multiple deployment autoscaling.
Autoscaling shall not abort most running requests/operations.

Implementation

I suggest using an additional autoscaling service which can horizontally autoscale both kubernetes deployments and kubernetes nodes in order to achieve some predefined target utilization. The following key points give more in-depth understanding of the approach.

Autoscaling service is an independent kubernetes deployment itself.
Autoscaling service deployment is created for each target deployment.
Autoscaling service configuration resides in kubernetes configmap as simple json configuration.

Algorithm

The following autoscaling algorithm can be used by the autoscaling service.

find the deployment
find the corresponding pods
find the corresponding nodes
- observe all nodes
- distinguish static and autoscaled nodes
- manage only autoscaled nodes
- ignore autoscaled nodes which have non target pods
check triggers
- disk pressure statuses of target nodes (ex. target statuses number = 0 disk pressure statuses)
- ram pressure statuses of target nodes (ex. target statuses number = 0 ram pressure statuses)
- cpu utilization of target nodes (ex. target utilization = 50 +- 10 %. scale up on 60%, scale down on 40%)
- ram utilization of target nodes (ex. target utilization = 50 +- 10 %. scale up on 60%, scale down on 40%)
- cluster nodes per target pod coefficient (ex. target coefficient = 100 cluster nodes per 1 target pod)
- target pods per node coefficient (ex. target coefficient = 2 target pods per 1 node)
- target pod failures per hour coefficient (ex. target coefficient = 3 pod failures per hour)
check limits
- minimum trigger duration (ex. trigger is active for 1 minute)
- minimum pods number (ex. 2 pods minimum)
- maximum pods number (ex. 10 pods maximum)
- minimum nodes number (ex. 2 nodes minimum)
- minimum nodes number (ex. 10 nodes maximum)
- post scale delay (ex. scale less frequent then once per 5 minutes)
scale up node if needed
- launch instance
- attach node
- set labels
scale up deployment if needed
scale down node if needed
- drain node
- terminate instance
scale down deployment if needed

Configuration

The following settings shall be configured for the autoscaling service to work:

kubernetes deployment to manage (ex. cp-api-srv)
kubernetes labels to manage (ex. cloud-pipeline/cp-api-srv)
triggers to check (ex. cpu utilization = 50%)
limits to consider (ex. from 1 to 5 pods/nodes)
cloud instance to scale (ex. instance type, iam role, security groups and etc.)

Questions

Does the autoscaling shall be configurable from Cloud Pipeline GUI?

maryvictol commented 2 years ago

The follow autoscaler parameters were checked: trigger:

cluster_nodes_per_target_replicas,
target_replicas_per_target_nodes,
cpu_utilization: max, monitoring_period
memory_utilization: max, monitoring_period

rules:

on_threshold_trigger: extra_replicas, extra_nodes

limit:

min_nodes_number,
max_nodes_number,
min_replicas_number,
max_replicas_number,
min_scale_interval,
min_triggers_duration

tcibinan commented 2 years ago

Cherry-picked to release/0.16 via 46ba80cb3ead2e43721a502404b1e3e4949255cf, 4eb26dbcef6227f133d534736054136fd623a82d, a19e73fbee597f09321d7981809ccbcfbc461835, e60fbcded01ef44c52b070e331177936c6f7a5f8 and 90e593c422d65d58d2d6fe2bdec38861e2a3d157.

epam / cloud-pipeline