cloudbase / garm

GitHub Actions Runner Manager
Apache License 2.0
136 stars 26 forks source link

Add pool balancing strategy #233

Closed gabriel-samfira closed 8 months ago

gabriel-samfira commented 8 months ago

This change adds the ability to specify the pool balancing strategy to use when processing queued jobs. Before this change, GARM would round-robin through all pools that matched the set of tags requested by queued jobs.

When round-robin (default) is used for an entity (repo, org or enterprise) and you have 2 pools defined for that entity with a common set of tags that match 10 jobs (for example), then those jobs would trigger the creation of a new runner in each of the two pools in turn. Job 1 would go to pool 1, job 2 would go to pool 2, job 3 to pool 1, job 4 to pool 2 and so on.

When "stack" is used, those same 10 jobs would trigger the creation of a new runner in the pool with the highest priority, every time.

In both cases, if a pool is full, the next one would be tried automatically.

For the stack case, this would mean that if pool 2 had a priority of 10 and pool 1 would have a priority of 5, pool 2 would be saturated first, then pool 1.

To use this we would first need to set a priority on the pools:

ubuntu@garm:~/garm$ garm-cli pool ls -r 70227434-e7c0-4db1-8c17-e9ae3683f61e
+--------------------------------------+---------------------------+--------------+-----------------------------------------+------------------+-------+---------+---------------+----------+
| ID                                   | IMAGE                     | FLAVOR       | TAGS                                    | BELONGS TO       | LEVEL | ENABLED | RUNNER PREFIX | PRIORITY |
+--------------------------------------+---------------------------+--------------+-----------------------------------------+------------------+-------+---------+---------------+----------+
| 8ec34c1f-b053-4a5d-80d6-40afdfb389f9 | ubuntu:22.04              | default      | self-hosted x64 Linux ubuntu repo       | gsamfira/scripts | repo  | true    | garm          |        0 |
+--------------------------------------+---------------------------+--------------+-----------------------------------------+------------------+-------+---------+---------------+----------+
| 577627f4-1add-4a45-9c62-3a7cbdec8403 | runner-upstream:latest    | small        | self-hosted x64 Linux ubuntu k8s repo   | gsamfira/scripts | repo  | true    | garm          |        0 |
+--------------------------------------+---------------------------+--------------+-----------------------------------------+------------------+-------+---------+---------------+----------+

# Update priority on one pool

ubuntu@garm:~/garm$ garm-cli pool update --priority 100 577627f4-1add-4a45-9c62-3a7cbdec8403
+--------------------------+----------------------------------------------------------+
| FIELD                    | VALUE                                                    |
+--------------------------+----------------------------------------------------------+
| ID                       | 577627f4-1add-4a45-9c62-3a7cbdec8403                     |
| Provider Name            | k8s_external                                             |
| Priority                 | 100                                                      |
| Image                    | runner-upstream:latest                                   |
| Flavor                   | small                                                    |
| OS Type                  | linux                                                    |
| OS Architecture          | amd64                                                    |
| Max Runners              | 20                                                       |
| Min Idle Runners         | 1                                                        |
| Runner Bootstrap Timeout | 20                                                       |
| Tags                     | self-hosted, x64, Linux, ubuntu, k8s, repo               |
| Belongs to               | gsamfira/scripts                                         |
| Level                    | repo                                                     |
| Enabled                  | true                                                     |
| Runner Prefix            | garm                                                     |
| Extra specs              |                                                          |
| GitHub Runner Group      |                                                          |
| Instances                | garm-DNj8H6ntBHAC (13ca518d-b6e1-40ea-a949-6e488503c6ab) |
+--------------------------+----------------------------------------------------------+

Now we need to switch the 70227434-e7c0-4db1-8c17-e9ae3683f61e repository to stack:

ubuntu@garm:~/garm$ garm-cli repo update --pool-balancer-type=stack 70227434-e7c0-4db1-8c17-e9ae3683f61e
+----------------------+--------------------------------------+
| FIELD                | VALUE                                |
+----------------------+--------------------------------------+
| ID                   | 70227434-e7c0-4db1-8c17-e9ae3683f61e |
| Owner                | gsamfira                             |
| Name                 | scripts                              |
| Pool balancer type   | stack                                |
| Credentials          | gabriel_org                          |
| Pool manager running | true                                 |
+----------------------+--------------------------------------+

And now, when new jobs come in, the 577627f4-1add-4a45-9c62-3a7cbdec8403 should always be preferred, until it is full.

Github currently doesn't allow us to prioritize which runners pick up jobs first, but we can at least decide which pools spin up runners first. This should at least offer some relief for issues like the one detailed here: