adobe / aquarium-fish

Your best secure distributed heterogeneous dynamic compute resource manager for CI
Other
7 stars 3 forks source link

Election rules implementation #15

Open sparshev opened 2 years ago

sparshev commented 2 years ago

The election rules right now are quite simple:

This answers "yep" or "nope" to the Application. Based on those parameters and the "last resort" random number the cluster choose one node to execute the Application. Or if all the nodes are busy (answering "Nope") - than goes to the next election round and repeats the process.

But there is much more variables to consider - for example driver can check the images size to download from Artifactory and calculate how much time it will take to unpack them and actually run the VM - so pre-cached images will boost the startup time for sure and this node should be used in priority. Or the application wants to use some specific machine (with particular IP address) to serve some security needs.

So all of that could be defined as Rules engine inside the cluster, and cluster or the application can choose which kind of rules to use to detect the winner of the election. Go templates could be used here to define the logic.

The available resources (CPU, Mem, Disk) need to be considered during making decision and overcommit could be allowed or disallowed. Config "node_slots" need to be removed after that.

sparshev commented 2 years ago

It think it will be useful to have scripts executed directly in the cluster nodes, but the lang need to be chosen properly. It's related to the ticket #17 :

Overall the choose is not that hard - for example we can get just a simple script lang and after that add support for the other langs as needed.

sparshev commented 2 years ago

Compared the full go script systems as gomacro and yaegi with native and executable go:

sparshev commented 1 year ago

Right now we defining the hard limit of the node slots in the configuration (NodeSlots), but cloud drivers actually doesn't use the current node resources (cpu/mem/disk). So maybe it will be a good addition to receive the available resources right from the driver and not from the node itself. That will help the rules to get available resources data depends on the label.