Admins shall be able to manage persistent cluster nodes

sidoruka commented 4 years ago

Background At the moment, Cloud Pipeline allows controlling the number of persistent compute nodes in the cluster. I.e. a certain number (cluster.min.size) of the nodes of a specified size (cluster.instance.type/cluster.instance.hdd) will be always available in the cluster (even if there is no workload). This is useful to speed up the compute instances creation process (as the nodes are already up and running).

Moreover, we need to make this mechanism a bit more flexible.

Approach

Allow specifying several node sizes and a corresponding number of such nodes (currently only node type can be used)
Allow specifying the schedule of these nodes creation/termination. E.g. the majority of the new compute jobs are started during the workday, so no need to keep these persistent instances over the weekends
GUI shall allow managing these parameters in the Cluster State tab, instead of the global Preferences (this shall be available to the ROLE_ADMIN users only)

SilinPavel commented 4 years ago

The next realization of this issue is proposed: 1) New entity NodeDescription introduced:

NodeDescription:

    Long id

    \\ set of requirements for node 
    RunInstance instance 

    int numberOfInstances

    \\ Start or Stop
    Action

2) When Admin create schedule for specific instance type with disk size, new NodeDescription created and persisted to DB

3) NodeDescription.id is used as schedulableId in RunSchedule to be able to schedule action

4) New type of Job NodeJob is implementeed

4) New field Queue<NodeDescription> freeNodeActions is added, this field will be shared between Autoscaler and NodeJob

5) NodeJob simply populate this queue so many times how specified in NodeDescription

Autoscaler works with this queue in the following manner:

void handleNodeActions() {
    NodeDescription nd = freeNodeActions.pop()
    switch (nd.action) {
        case Start:
            for (i = 0; i < nd.numberOfInstances; i++) {
                startNode(nd.runInstance, nd.id)
            }
        case Stop:
            for (i = 0; i < nd.numberOfInstances; i++) {
                markNodesForTerination(nd.id)
            }
        default:
            throw new IlligalArgumentEx
    }

    terminateAllMarkedNodeIfPossible()
}

In method startNode we will create new node, if it is possible (f.e. if maxNodeCount > currentClusterSize) and mark it with NodeDescription id

In method markNodesForTerination all nodes with specified NodeDescription.id will be marked with tag readyToTerminate

And in method terminateAllMarkedNodeIfPossible all nodes with readyToTerminate tag will terminated if it possible (if no run currently running on that node)

This approach with marking node solves the problem when we should kill a node due to schedule but it still working with some run, in our case it will be terminated as soon as it done with run (if it not possible to do it right now, it will be done on the next step of autoscaler loop)

mzueva commented 4 years ago

Usage of scheduled approach may not cover all the requirements to persistent node management, so I'd suggest to implement a new functionality for this.

Approach

Cluster size management

A new entity PersistentNode is added with the following fields:

count - number of nodes
regionId
instanceType
instanceDisk
priceType
dockerImage to support pre-pulled images (TBD)
start a CRON (TBD) expression specifying time from which node shall be active (e.g. Monday 10 AM)
end a CRON (TBD) expression specifying time from which schedule is not active (e.g. Friday 6 PM)

Autoscaler changes

Scale up:

At each autoscaler iteration list of active PersistentNodes is fetched and cluster state is verified. If current number of nodes with required configuration in the cluster is lower than count specified in PersistentNode new free nodes are created with respect to total cluster size limit
Additional labels may be added to k8s nodes, to make matching process easier
Generation of run_id labels for free nodes - review current approach
Handling of async free node creation shall be implemented

Scale down:

Unused nodes with configuration matching active PersistentNodes shall be left in the cluster if total number of such nodes is less or equal to node count specified in the PersistentNode

Free (Persistent) nodes handling

Fix methods to create free nodes for all cloud providers
For nodes with pre-pulled images the reassign process shall be updated (and maybe for plain nodes as well)
Change reassign process to compare disk ge

mzueva commented 3 years ago

@sidoruka server part backported to release/0.16

NShaforostov commented 3 years ago

Test cases were created by #1929 and located here.

NShaforostov commented 3 years ago

Docs were added via #1516.

epam / cloud-pipeline